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Preface 


Asset prices are dynamic, changing frequently whenever the financial markets 
are open. Some of us are curious about how and why these changes occur, while 
many people aspire to know where prices are likely to be at future times. In this 
book I describe how prices change and what we can learn about future prices. As 
financial markets are highly competitive, there are limits to how much guidance 
I can provide about why particular price changes occur and the precise level of 
future prices. 

Descriptions of past price changes and predictive statements about future prices 
usually rely on insights from mathematics, economics and behavioral theory. 
My emphasis in this book is on using statistical analysis and finance theory to 
learn from prices we have seen about the probabilities of possible prices in the 
future. 

Familiarity with financial, probabilistic, and statistical concepts is advisable 
before reading this book. A good introductory finance course will provide a 
satisfactory understanding of financial markets (including derivative securities), 
efficient market theory and the single-factor, capital asset pricing model. Quanti- 
tative courses that cover random variables, probability distributions, data analysis, 
regression models, and hypothesis testing are the minimum requirement. Mathe- 
matical knowledge and expertise are always an advantage, although I assume less 
prior study than the authors of most graduate texts. 

This book is written for students of economics, finance, and mathematics who 
are familiar with the above topics and who want to learn about asset price dynam- 
ics. It is also intended to provide practitioners and researchers with an accessible 
and comprehensive review of important theoretical and empirical results. 

I have taught almost all of the contents of this book, on a variety of under- 
graduate, postgraduate, doctoral, and executive courses. The topics selected and 
the mathematical depth of the exposition naturally depend upon the audience. 

My final-year, elective, undergraduate course at present includes a review of 
relevant probability theory (most of Chapter 3), a survey of the established facts 
about asset price changes (Chapter 4), a popular method for testing if prices 
changes are random (Chapter 5), an appraisal of trading rules (parts of Chap- 
ter 7), an overview of volatility definitions and reasons for volatility changes 
(Chapter 8), an introduction to the simplest and most often applied volatility mod- 
els (Chapter 9), a summary of results for prices recorded very frequently (parts 
of Chapter 12), a description of Black-Scholes option pricing formulae, implied 
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volatilities and risk-neutral pricing theory (Chapter 14, as far as Section 14.4), 
and a review of volatility forecasting (some of Chapter 15). 

My core financial econometrics course for students taking a postgraduate 
degree in finance also includes additional volatility theory and models (parts 
of Chapters 10 and 11), option pricing when volatility changes (the remainder of 
Chapter 14), and methods that produce predictive distributions (parts of Chap- 
ter 16). A typical doctoral course covers most of Chapters 8-16. 

Any course will be more rewarding if students obtain new skills by analyzing 
market prices. Students should be encouraged to acquire data, to test random 
walk theories, to assess the value or otherwise of trading rules, to estimate a 
variety of volatility models, to study option prices, and to produce probabilities 
for possible ranges of future prices. I provide several Excel examples to facilitate 
the appropriate calculations. 

Educational resources can be downloaded from my website, as mentioned at 
the end of Chapter 1. I expect the website to be dynamic, with content that reflects 
correspondence with my readers. 

The topics covered in this book reflect interests that I have acquired and devel- 
oped during thirty years of research into market prices. My research has been 
inspired, influenced, and encouraged by very many people and I particularly wish 
to acknowledge the contributions made by Clive Granger, Robert Engle, Torben 
Andersen, Richard Baillie, Tim Bollerslev, Francis Diebold, Andrew Lo, Peter 
Praetz, Neil Shephard, and Richard Stapleton. 

My doctoral thesis, completed in 1978, contained analysis of commodity 
markets. Subsequently, most of my research has focused on stock and foreign 
exchange markets. Likewise, most of the examples in this book are for equity and 
currency price series. 

My longstanding interest in the predictability of asset prices is reflected in 
Chapters 5-7, that can be skipped by anyone who considers all nontrivial point 
forecasts are futile. My thesis contained embryonic volatility models, one of which 
became the stochastic volatility model I published in 1982. Inspired by Robert 
Engle's simultaneous and path-breaking work on ARCH models, I also defined 
and analyzed the GARCH(1, 1) volatility model at about the same time that 
Tim Bollerslev was working independently on the general GARCH(p, q) model. 
Volatility models allow us to make informed predictions about future volatility. 
They are covered in depth in this book, especially in Chapters 8-12, 14, and 
15. Much more recently, researchers have used option prices to infer probability 
distributions for future asset price levels. This is covered in Chapter 16. 

Readers will soon notice that I refer to a considerable number of articles by 
other researchers. These citations reflect both the importance of research into 
financial market prices and the easy availability nowadays of the price data that 
are investigated by empirical researchers. A few papers, which I recommend as 
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an introduction to the relevant research literature, are listed at the end of most 
chapters. 

While I have attempted to document empirical regularities and models that will 
stand “the test of time," I expect important and exciting new results to continue to 
appear in the years ahead. A good way to keep up to date is to read working papers 
at www.ssrn.com and papers published in the leading journals. Many of the most 
important papers for research into asset price dynamics, at the time of writing, 
appear in the Journal of Econometrics, the Journal of Finance, the Journal of 
Financial Economics, and the Review of Financial Studies. 

This book owes much to my wife, Sally, our children, Sarah, Katherine, 
and Adam, my publisher, Richard Baggaley, and my friends and colleagues at 
Lancaster University, particularly Mark Shackleton. I thank them all for their 
encouragement, advice, patience, and support. I also thank my copy-editor, 
Jon Wainwright, whose friendly collaboration and craftsmanship are much appre- 
ciated. 

I thank the many reviewers of my original proposal and my draft manuscript 
for their good advice, especially Neil Shephard and Martin Martens. Many of the 
results in this book were obtained during my collaborations with my cited co- 
authors: Xinzhong Xu, Ser-Huang Poon, Bevan Blair, Yuan-Chen Chang, Mark 
Shackleton, Nelson Areal, Xiaoquan Liu, Martin Martens, and Shiuyan Pong. 
I thank them all for their contributions to a deeper understanding of asset price 
dynamics. Finally, I thank Dean Paxson for his positive persistence in enquiring 
about my progress with this book. 
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Introduction 


1.1 Asset Price Dynamics 


Asset prices move as time progresses: they are dynamic. It is certainly very difficult 
to provide a correct prediction of future price changes. Nevertheless, we can make 
statements about the probability distributions that govern future prices. Asset price 
dynamics are statements that contain enough detail to specify the probability 
distributions of future prices. We seek statements that are empirically credible, 
that can explain the historical prices that we have already seen. 

Investors and fund managers who understand the dynamic behavior of asset 
prices are more likely to have realistic expectations about future prices and the 
risks to which they are exposed. Quantitative analysts need to understand asset 
price dynamics, so that they can calculate competitive prices for derivative secu- 
rities. Finance researchers who explore hypotheses about capital markets often 
need to consider the implications of price dynamics; for example, hypothesis tests 
about price reactions to corporate events should be made robust against changes 
in price volatility around these events. 

Explaining how prices change is a very different task to explaining why they 
change. We will encounter many insights into how prices change that rely on 
the empirical analysis of prices. Many general explanations for price changes 
can be offered: relevant news about the asset and its cash flows, macroeconomic 
news, divergent beliefs about the interpretation of news, and changes in investor 
sentiment. It seems, however, to be impossible to provide specific explanations 
for most price changes. 


1.2 Volatility 


A striking feature of asset prices is that they move more rapidly during some 
months than during others. Prices move relatively slowly when conditions are 
calm, while they move faster when there is more news, uncertainty, and trading. 
The volatility of prices refers to the rate at which prices change. Commentators 
and traders define this rate in several ways, primarily by the standard deviation 
of the return obtained by investing in an asset. Risk managers are particularly 
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Figure 1.1. A year of S&P 500 index levels. 


interested in measuring and predicting volatility, as higher levels imply a higher 
chance of a large adverse price change. 


1.3 Prediction 


Predictions concerning future prices are obtained from conditional probability 
distributions that depend on recent price information. Three prediction problems 
are addressed in this book. The first forecasting question posed by most people is, 
Which way will the price go, up or down? However hard we try, and as predicted by 
efficient market theory, it is very difficult to obtain an interesting and satisfactory 
answer by considering historical prices. A second question, which can be answered 
far more constructively, is, How volatile will prices be in the future? The rate at 
which prices change is itself dynamic, so that we can talk of extreme situations 
such as turbulent markets (high volatility) and tranquil markets (low volatility). 
The level of volatility can be measured and predicted, with some success, using 
either historical asset prices or current option prices. A third and more ambitious 
question is to ask for the entire probability distribution of a price several time 
periods into the future. This can be answered either by Monte Carlo simulation of 
the assumed price dynamics or by examining the prices of several option contracts. 


1.4 Information 


There are several sources of information that investors can consider when they 
assess the value of an asset. To value the shares issued by a firm, investors may 
be interested in expectations and measures of risk for future cash flows, interest 
rates, accounting information about earnings, and macroeconomic variables that 
provide information about the state of the economy. These specific sources of 
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Figure 1.2. A year of VIX observations. 


information are generally ignored in this text, because my objective is not to 
explain how to price assets. Relevant information is not ignored by traders, who 
competitively attempt to incorporate it into asset prices. Competition between 
traders is often assumed in finance research to be sufficient to ensure that prices 
very quickly reflect a fair interpretation of all relevant information. 

The prices of financial assets and their derivative securities are the information 
that we consider when making statements about future asset prices. Our typical 
information is a historical record of daily asset prices, supplemented in the later 
chapters by more frequent price observations and by recent option prices. Fig- 
ure 1.1 shows a year of daily closing levels for the Standard & Poor 500-share 
index, from June 2003 until May 2004. These numbers could be used at the end 
of May to answer questions like, What is the chance that the index will be above 
1200 at the end of June? Figure 1.2 shows daily observations during the same year 
for an index of volatility for the S&P 500 index, called VIX, that is calculated 
from option prices. These numbers are useful when predicting the future volatility 
of the US stock market. 

Studying daily price data and probability models provides a good introduction 
to asset price dynamics, so we focus on daily data in Chapters 4-11. More can be 
learnt from more-frequent price observations, as we will later see in Chapters 12 
and 15. Option prices are also informative about future asset prices and their study 
requires models that are specified for a continuous time variable, as in Chapters 13 
and 14. 


1.5 Contents 


The book is divided into five parts, which follow this introductory chapter. 
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The first part provides a foundation for the empirical modeling of time series 
of returns from financial assets. Chapter 2 explains how returns from investments 
are calculated from prices. A set of regularly observed prices can be used to 
define a time series of returns. Several examples are presented and advice is 
given about data-collection issues. Chapter 3 commences with a summary of the 
theoretical properties of random variables. It then continues with the definitions 
and properties of important probability models for time-ordered sequences of 
random variables, called stochastic processes. Consideration is given to a variety 
of stochastic processes that are used throughout the book to develop descriptions 
of the dynamic behavior of asset prices. 

Chapter 4 surveys general statistical properties of time series of daily returns 
that are known as stylized facts. Any credible stochastic process that represents 
asset price dynamics must be able to replicate these facts. Three stylized facts 
are particularly important. First, the distribution of returns is not normal. Second, 
the correlation between today's return and any subsequent return is almost zero. 
Third, there are transformations of returns that reveal positive correlation between 
observations made at nearby times; an example is provided by the absolute values 
of returns. 

The second part presents methods and results for tests of the random walk 
and efficient market hypotheses. The random walk hypothesis asserts that price 
changes are in some way unpredictable. Chapter 5 defines and evaluates the pop- 
ular variance-ratio test of the hypothesis, which relies on a comparison between 
the variances of single-period and multi-period returns. It is followed in Chapter 6 
by several further tests, which use a variety of methods to look for evidence that 
tomorrow's return is correlated with some function of previous returns. Evidence 
against the random walk hypothesis is found that is statistically significant but 
not necessarily of economical importance. Chapter 7 evaluates the performance 
of trading rules and uses their results to appraise the weak form of the efficient 
market hypothesis. These rules would have provided valuable information about 
subsequent prices in past decades, but their usefulness may now have disappeared. 

The third part covers the dynamics of discrete-time asset price volatility. Chap- 
ter 8 summarizes five interpretations of volatility, all of which refer to the standard 
deviation of returns. It then reviews a variety of reasons for volatility changes, 
although these can only provide a partial explanation of this phenomenon. Chap- 
ter 9 defines ARCH models and provides examples based upon some of the most 
popular specifications. These models specify the conditional mean and the con- 
ditional variance of the next return as functions of the latest return and previous 
returns. They have proved to be highly successful explanations of the stylized facts 
for daily returns. Chapter 10 describes more complicated ARCH models and the 
likelihood theory required to perform hypothesis tests about ARCH parameters. 
Guidance concerning model selection is included, based upon tests and diagnostic 
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checks. Chapter 11 is about stochastic volatility models, which are also able to 
explain the stylized facts. These models represent volatility as a latent and hence 
unobservable variable. Information about the dynamic properties of volatility 
can then be inferred by studying the magnitude of returns and by estimating the 
parameters of specific volatility processes. 

The fourth part describes high-frequency prices and models in Chapter 12. The 
returns considered are now far more frequent than the daily returns of the preced- 
ing chapters. Many examples are discussed for returns measured over five-minute 
intervals. Their stylized facts include significant variations in the average level of 
volatility throughout the day, some of which can be explained by macroeconomic 
news announcements. The additional information provided by intraday returns 
can be used to estimate and forecast volatility more accurately. 

The fifth and final part presents methods that use option prices to learn more 
about future price distributions. Most option pricing models depend on assump- 
tions about the continuous-time dynamics of asset prices. Some important con- 
tinuous-time stochastic processes are defined in Chapter 13 and these are used to 
represent the joint dynamics of prices and volatility. Option pricing models are 
then discussed in Chapter 14 for various assumptions about volatility: constant, 
stochastic, or generated by an ARCH model. The empirical properties of implied 
volatilities are discussed, these being obtained from observed asset and option 
prices by using the Black-Scholes formulae. Chapter 15 compares forecasts of 
future volatility. Forecasts derived from option-implied volatilities and intraday 
asset prices are particularly interesting, because they incorporate more volatility 
information than the historical record of daily prices and often provide superior 
predictions. 

Chapter 16 covers methods for obtaining densities for an asset price at a later 
date, with a particular emphasis on densities estimated using option prices. Sev- 
eral methods for obtaining risk-neutral densities from options data are described. 
These densities assume that risk is irrelevant when future cash flows are priced. 
Consequently, they are transformed to produce asset price densities that incorpo- 
rate risk aversion. 


1.6 Software 


Some of the most important calculations are illustrated using Excel spreadsheets 
in Sections 5.4, 7.6, 9.4, 9.8, 11.4, 11.7, 14.3, and 16.10. Excel is used solely 
because this software will be available to and understood by far more readers 
than alternatives, such as Eviews, Gauss, Matlab, Ox, and SAS. Some of these 
alternatives contain modules that perform many useful calculations, such as the 
estimation of ARCH models, and it should be a straightforward task to recode any 
of the examples. The spreadsheets use several Excel functions that are explained 
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by Excel’s Help files. More elegant spreadsheets can be obtained by using the 
Visual Basic for Applications (VBA) programming language. 


1.7 Web Resources 


Additional information, including price data, end-of-chapter questions, and in- 
structions about sending email to the author, are available online. Some of the 
questions are empirical, others are mathematical. For all web material, first go to 


http://pup.princeton.edu/titles/8055.html 


and then follow the link to the author's web pages. 


Part I 


Foundations 


2 


Prices and Returns 


Methods for creating time series of market prices and returns to investors are 
described and illustrated in this chapter. 


2.1 Introduction 


Any empirical investigation of the behavior of asset prices through time requires 
price data. Some questions to be answered are, Where will we find our data?, 
How many years of data do we want to analyze?, and How many prices for each 
year do we wish to obtain? Advice on these topics and other data-collection issues 
is provided in Section 2.3, after first presenting two representative examples of 
price series in Section 2.2. 

Almost all empirical research analyzes returns to investors rather than prices. 
Returns are more appropriate for several reasons. The most important is that 
returns, unlike prices, are only weakly correlated through time. Time series of 
prices and dividends can be converted into time series of returns using two distinct 
definitions, which are explained in Section 2.5. Our preferred definition is that 
returns equal changes in the logarithms of prices, with appropriate adjustments 
when dividends are distributed. The definitions of returns are preceded by two 
examples in Section 2.4 and followed by a summary of twenty further time series 
of returns in Section 2.6. 


2.2 Two Examples of Price Series 


A time series is a set of observations arranged in time order. Figures 2.1 and 2.2 are 
examples of daily time series, respectively, for a portfolio of large US firms and an 
exchange rate. Both series contain one number for each trading day from January 
1991 to December 2000 inclusive. These series were obtained from Datastream. 
They can be downloaded from the website mentioned near the end of Chapter 1. 
They are used to illustrate Excel calculations in Chapters 5, 7, 9, and 11. 

The stock picture shows the daily closing level of the Standard & Poor 100- 
share index. This index does not include dividend payments. There are 2531 index 
levels in this ten-year series, as there are no observations for Saturdays, Sundays, 
and holidays. Investors earned high returns from US stock market investments 
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Figure 2.1. Levels of the Standard & Poor 100-share index. 
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Figure 2.2. DM/$ exchange rates. 


during this period. The series commences at 153 and ends at 686. The US market 
commenced a sharp fall in 2000 from the peak level of 833, which continued for 
a further two years beyond the end of the series. We will refer to index levels as 
prices whenever this is convenient. 

The exchange rate picture shows the number of Deutsche marks (DM) that 
could be purchased for one dollar at the interbank spot market. This series has 2591 
observations, recorded at 12:00 local time in New York, which range from 1.35 
to 2.36. The only days excluded are weekend days, 25 December, and 1 January. 
The DM/$ rate from 1999 onwards is calculated from the Euro/$ rate and the 
fixed rate for converting German currency into euros. 


2.3 Data-Collection Issues 


Time series of prices can be obtained from many sources, including websites, 
commercial vendors, university research centers, and financial markets. Table 2.1 
lists web addresses for a variety of data providers. 
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Table 2.1. Sources of price time series. 


Source Web address Markets 
CRSP www.crsp.com US stocks 
Commodity Systems Inc www.csidata.com Futures 
Datastream www.datastream.com/product/has/ ` Stocks, bonds, 

currencies, etc. 

IFM www.theifm.org Futures, US stocks 
Olsen & Associates www.olsen.ch Currencies, etc. 
Trades and Quotes DB www.nyse.com/marketinfo US stocks 
US Federal Reserve www.federalreserve.gov/releases/ Currencies, etc. 
Yahoo! biz.yahoo.com/r/ Stocks, 


many countries 


The majority of sources provide daily data, such as end-of-day prices. Free 
daily data are available at several websites. For example, at the time of writing, 
Yahoo! provides long time series of equity returns for many countries. Free data 
may be less accurate than that provided by vendors such as Datastream and the 
Center for Research in Security Prices (CRSP). Datastream sells data for all the 
major asset classes at all important markets, while CRSP sells price records for 
every stock listed on the major US stock exchanges. Daily futures prices are sold 
by several organizations, including the Institute for Financial Markets (IFM). 

More skill is required to analyze transactions data, such as the time and price 
of every trade. Vast amounts of transactions data can be bought: for US equities, 
from the Trades and Quotes database owned by the New York Stock Exchange; 
for foreign exchange rates and other assets, from Olsen & Associates; and for 
futures contracts, from IFM. 


2.3.1 Frequency 


The appropriate frequency of observations in a price series depends on the data 
available and the questions that interest a researcher. The time interval between 
prices ought to be sufficient to ensure that trade occurs in most intervals and it is 
preferable that the volume of trade is substantial. Very often, selecting daily prices 
will be both appropriate and convenient. Consequently, we focus on the daily fre- 
quency in Chapters 2-11. A series of daily prices contains useful information that 
may be missing in a series of weekly or monthly prices. The additional informa- 
tion increases the power of hypothesis tests, it improves volatility estimates, and 
it is essential for evaluations of trading rules. 

The number of observations in a time series of daily prices should be sufficient 
to permit powerful tests and accurate estimation of model parameters. Experi- 
ence shows that at least four years of daily prices (more than 1000 observations) 
are often required to obtain interesting results; however, eight or more years of 
prices (more than 2000 observations) should be analyzed whenever possible. The 
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best statistical model and the parameters of any preferred model are, of course, 
more likely to change as the number of observations increases. Very long price 
series, spanning several decades, can always be subdivided and then results can 
be compared across subperiods. 

Analysis of prices recorded more frequently than once a day must take account 
of the uneven flow of transactions during the day, which creates intraday effects. 
It is also often necessary to consider the consequences of trade prices bouncing 
between bid and ask quotations. Interesting conclusions can be obtained from 
high-frequency prices, as will be shown in Chapter 12. 


2.3.2 Price Definitions 


Several choices may be made when daily prices are recorded and it can be impor- 
tant to know how a data vendor defines a representative price for the day. The 
price is usually some type of closing price. It may be a bid price, an ask price, 
or an average. It may be either the final transaction price of the day or the final 
quotation. It may be a settlement price at a futures market in which case it usually 
equals the average price over the final seconds or transactions of the day's busi- 
ness. Few markets now limit how far prices can move within a trading session, 
but for such markets it is advisable to identify any prices that are constrained by 
limit regulations. 

Price series should be defined using one price per period, recorded at a con- 
stant point within the period such as the market's close. Spurious correlation can 
occur in "returns" if this convention is ignored. A substantial, positive correlation 
between consecutive “returns” is created if a weekly average of daily prices is 
studied (Working 1960), or if the average of the day’s high and low prices is used 
or even simply the day’s high (Daniels 1966; Rosenberg 1970). 


2.3.3 Additional Information 


In addition to daily closing prices, it is often possible to obtain daily open, high 
and low prices, and daily trading volume. This information is routinely recorded 
by futures markets. High and low prices can be used to improve estimates of price 
volatility (Parkinson 1980; Garman and Klass 1980). Trading volume can be used 
to decide if there is a “thin trading" problem. Many instances of zero or low volume 
imply that published prices may not describe the prices at which transactions could 
have been agreed. Many occasions of identical prices on consecutive days is also 
indicative of thin trading. 

Some sources provide both bid and ask prices. Large bid—ask spreads can be 
another indicator of a thin market. Assets that are traded frequently usually have 
very small spreads and the average of the bid and ask prices can then be used to 
define the closing price. 
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Figure 2.3. Ten years of S&P 100 daily returns. 
2.3.4 Futures Prices 


Long series of futures prices require several different contracts to be used. This 
causes few problems providing the derived series of returns contains one return 
per trading period and each return is calculated using two prices for the same 
contract. Sellers of futures deliver goods to buyers on one or more days of the 
delivery month named in the contract. It is advisable to avoid using any of the 
final prices from a contract that are atypical, perhaps because of delivery options. 
After excluding such prices, it is conventional to take prices from the contract 
nearest to delivery to reduce any possibility of thin trading. 


2.3.5 Error Checks 


Data providers do make mistakes and some providers supply data in a form that 
includes dates and prices when markets are closed. The correct number of daily 
prices in one year depends on closures for holidays and weekends. Too few prices 
indicates that the source has overlooked some days, or that trading has been 
suspended, while too many prices implies that numbers have been invented for 
holidays and/or weekends. A substantial error in one price can often be identified 
by listing all large percentage changes from one day to the next, followed by 
looking for pairs of consecutive days in the list. Large percentage changes can 
often be checked against a second source. For futures, a useful second source is 
often given by the spot market or a contract with a different delivery month. 


2.4 Two Returns Series 


An investor who owns an asset throughout a trading period obtains a return on 
investment that depends on the initial price, the final price and any dividend 
payments. One way to measure the return on investment is (p; — pr—1)/Pr—1,; 
when p; is the price for time period f and dividend payouts are ignored. 
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Figure 2.4. Ten years of DM/$ daily returns. 


Figures 2.3 and 2.4 show time series of daily returns calculated from the prices 
that are plotted in Figures 2.1 and 2.2. These returns vary substantially around 
their average levels, which are close to zero. The S&P 100 index returns appear to 
vary more in the second half of the series. Their extreme values are the maximum 
of 5.8% and the minimum of —7.2%. The corresponding extremes for the DM/$ 
returns are 3.5% and —3.9%. 

Time series plots of returns display an important feature that is usually called 
volatility clustering. This empirical phenomenon was first observed by Man- 
delbrot (1963), who said of prices that “large changes tend to be followed by 
large changes—of either sign—and small changes tend to be followed by small 
changes.” Volatility clustering describes the general tendency for markets to have 
some periods of high volatility and other periods of low volatility. High volatility 
produces more dispersion in returns than low volatility, so that returns are more 
spread out when volatility is higher. A high volatility cluster will contain several 
large positive returns and several large negative returns, but there will be few, if 
any, large returns in a low volatility cluster. 

Clustering effects for the S&P 100 index are more clearly seen in Figure 2.5 
for the daily returns from March 1998 to February 1999. This figure shows a 
high volatility cluster from late August until mid October, with less volatility 
before August than there is after October. Likewise, Figure 2.6 shows a year of 
DM/$ returns that have six months of high volatility, which are both preceded and 
followed by three months of much lower volatility. 


2.5 Definitions of Returns 


Statistical analysis of market prices is more difficult than analysis of changes in 
prices. This is because consecutive prices are highly correlated but consecutive 
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Figure 2.5. One year of S&P 100 daily returns. 
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Figure 2.6. One year of DM/$ daily returns. 


changes have very little correlation, if any. Consequently, it is more convenient 
to investigate suitable measures of changes in prices. 

Returns to an investor are the preferred way to measure price changes. Returns 
can be defined by changes in the logarithms of prices, with appropriate adjustments 
for any dividend payments. We apply this definition to stocks, stock indices, 
exchange rates, and futures contracts. 


2.5.1 Stock Returns 


Let p; be a representative price for a stock in period t. Usually this price will be 
either the final transaction price or the final quotation during the period. Initially, 
let us assume that the buyer pays the seller immediately for stock bought. Suppose 
d is the present value of dividends, per share, distributed to those people who 
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own stock during period t. On almost all days there are no dividend payments and 
then d; = 0. Sometimes dividend payments are simply ignored, so then d; = 0 
for all days t. 

Three price change quantities appear in empirical research: 


re = pr + di — pa, 
r = (pt + di — pi-i)/ pia, (2.1) 
r; = log(p; + di) — log(pi-1). (2.2) 


The first differences rž are the payoff from buying one share at time t — 1 and 
then selling it at time f, ignoring transaction costs and any differences between 
buying and selling prices. First differences depend on the price units (e.g. dollars 
or cents) and thus comparisons across assets are not straightforward. They have 
the further disadvantage that their variances are proportional to the price level. 
For these reasons, first differences cannot be recommended. 

One dollar invested in shares at time t — 1 buys 1/ p;.., shares. The total dollar 
proceeds from selling these shares at time t plus any dividends received equals 


(pr + d) pii = 1 4 ri. 


Clearly, r/ is the one-period return on investment for period t. 

The interest rate equivalent to r; when interest is paid n times in one period is 
the number i, that solves (1 + (in/n))"” = 1 + r;. The equivalent continuously 
compounded rate is given by the limit of i, as n — oo. This limit is 


log( + rj) = log((pi + di)/ pi-1) = ri. 


Here, and throughout this book, “log” refers to the natural logarithm. We see that 
rı is the continuously compounded return for period t. 
The return measures r; and ri are very similar numbers, since 


lr = expr) =ltrit+ 5 te 


and very rarely are daily returns outside the range from —10% to 10%. Some 
people prefer to study the continuously compounded return r;, others prefer the 
simple return r;. It would be surprising if an important conclusion depended on 
the choice. This book documents several results for continuously compounded 
returns and generally r; is called the return for period t. 

The primary reason for selecting the continuously compounded definition is 
that multi-period returns are then sums of single-period returns. For example, the 
proceeds from investing one dollar in stock at time t — 1 followed by selling at 
time f + 1 are 


expr, + r1) = +r) + rigi) 
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and thus two-period returns (ignoring dividends) are 


Ft+1,2 = log(pr+1) —log(pi-i1) 2 ri + Ft+1 


and 


A / / pd 
Vë fb (Pri z Pr-1)/Pt-1 =r,+ Fad + freie 


The former equation provides simpler theoretical results than the latter equation 
for two-period returns, and likewise for general multi-period returns. 

The definitions of returns ignore inflation and thus give nominal results. Real 
returns are nominal returns minus adjustments for inflation but these cannot be 
estimated satisfactorily for short periods of time. Consequently, it is conventional 
to study nominal returns, which we do throughout this book. 


2.5.2 Delayed Settlement 


Stock transactions are often settled a few days after the price is agreed. A stock 
market price is then, strictly speaking, a forward price for a later date. This fact is 
usually ignored when calculating returns. Its relevance has diminished in recent 
years as settlement periods have been reduced. The importance of the settlement 
issue for older datasets is discussed by Lakonishok and Levi (1982), Condoyanni, 
O'Hanlon, and Ward (1987), and Solnik (1990). 

To quantify the impact of delayed settlement, suppose a transaction at price p; 
in period t is settled c; calendar days later. Also, suppose the relevant continuously 
compounded daily interest rate is i; during the settlement period. The spot price 
5; for immediate settlement in period t would satisfy 


pi = St exp(eii;) 


if both spot and forward deals were possible, arbitrage profits cannot be made, 
and there are no dividend payments or transaction costs. Also assuming interest 
rates are the same at times ¢ — 1 and f£, it can be deduced that the forward return 
rif = log(pi/ pi—1) equals the spot return pe = log(s;/s;—1) plus a term that 
only involves interest rates and settlement dates: 


rtf = Fts + (c; — ci 1)ii. 


Thus forward and spot returns are very similar when settlement is after a constant 
number of business days, but they can differ by more when trades are settled on 
fixed days during trading accounts (see, for example, Solnik 1990). 


2.5.3 Stock Indices 


Stock indices are typically weighted averages of the prices of the component 
stocks. The same formulae as before are used to calculate returns from index 
levels. Very often dividends are excluded from the index. They are often also 
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excluded from return calculations, perhaps because of the effort required to aggre- 
gate the dividends from all the component stocks. The composition of most indices 
changes occasionally, so that a long time series will not be made up of returns 
from a homogeneous asset. 


2.5.4 Spot Currency Returns 


Now suppose p, is the dollar price in period t for one unit of foreign currency, say 
euros. A euro investment earns a dividend in the form of euro interest payments. 
Let i* , be the continuously compounded interest rate for deposits in foreign 
currency from time f — | until time t. Then one dollar used to buy 1/p;—1 euros in 
period t — 1, which are sold with accumulated interest in period f, gives proceeds 
equal to p; exp. )/ p;—1 and hence the return is 


r, = log(pr) — log(pi-i) + i7 4. (2.3) 


Researchers often ignore the foreign interest rate in this definition, in which case 
the numbers studied are logarithmic price changes rather than returns. The interest 
rate term is, however, very small compared with the magnitude of typical daily 
logarithmic price changes. 


2.5.5 Futures Returns 


Next suppose f; r is the futures price in period ¢ for delivery or cash settlement 
in some later period T. As there are no dividend payouts on futures contracts, it 
is conventional to define the futures return to be the logarithmic price change: 


ri = log(fi,r) — log(fr-1,7). (2.4) 


We will follow this convention. Note that the same futures contract (and hence the 
same delivery date) should be used to obtain the two prices used in this definition. 

Capital is not required to trade futures. Only a margin deposit is required and this 
could be a security which pays interest to the party making the margin deposit. 
Consequently, the logarithmic price change can no longer be interpreted as a 
return on an investment. Nevertheless, to simplify terminology we will use the 
word “return” when referring to a change in the logarithm of a futures price. 

Many goods, such as currencies, have spot and futures prices that are tied to 
each other by the impossibility of making arbitrage profits. A typical theoretical 
equation that relates the futures price f;,7 to the spot price p; is then 


fir = pr exp((ir — iT — 0) 


with i, and i7, respectively, the one-period domestic and foreign interest rates at 
time £ applicable for lending and borrowing until time T (see, for example, Hull 
2000, Section 3.8). When interest rates are the same at times t — 1 and t, it follows 
that 


logCfi 7) — logCfi-1,7) = log(pi) — log(pi-i) + ify — iia Q.5) 
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Comparing equations (2.3)-(2.5), it can be seen that what we call the futures 
return is the spot return minus the one-period domestic interest rate, when our 
assumptions about arbitrage and interest rates are applicable, i.e. 


fut spot 
y (futures) = 7 SP) 


t — lt]. 


This result simply reflects the fact that a futures trader does not lose interest on 
capital when financing trades. Similar price equations and conclusions can be 
obtained for stock index and commodity futures (Hull 2000). 

The interpretation of futures returns is more complicated for interest-rate fu- 
tures. Treasury bond futures, for long-term debt, have delivery options which 
complicate the relationship between spot and futures prices. If convenient sim- 
plifying assumptions are made about delivery options and the shape of the term 
structure of interest rates, then it is possible to derive an equation of the form 


vëliue) =m; GER de i1). 
The multiplier m; is less than 1, reflecting the fact that delivered bonds have a 
shorter duration than present-day spot bonds. Treasury bill futures, for short-term 
debt, have market prices defined by equivalent annual interest rates instead of the 
futures price of a traded bill. The return calculated directly from market prices 
for 90-day bill futures is then approximately four times the return that would be 
calculated if prices were quoted for the delivery of 90-day bills. 


2.6 Further Examples of Time Series of Returns 


A database containing twenty time series of daily returns is used in Chapters 4—7 
to illustrate some typical empirical results. Each series covers a decade of trading, 
concluding on a date in the first half of the 1990s. Nine of the series contain spot 
returns from equity investments in indices or individual stocks. The other eleven 
series contain returns calculated from futures prices. Table 2.2 summarizes the 
dates and assets that define the series. 

The price data come from several sources: the Center for Research in Secu- 
rity Prices (CRSP), Datastream, banks, and the London International Financial 
Futures Exchange (LIFFE). These data cannot be provided to readers, for con- 
tractual reasons. 


2.6.1 Spot Series 


The nine spot equity series are for three indices, three US companies and three 
UK companies. Returns from the indices provide information about investments 
in market-wide diversified portfolios, in the US, the UK, and Japan. The US 
series provides returns on the Standard & Poor 500-share index, with dividends 
reinvested. The UK and Japanese series are, respectively, for the Financial Times 
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Table 2.2. Descriptions of twenty time series of daily returns. 


Spot or Inclusive dates No. of 
Returns series Futures Market From To returns 
S&P 500-share S New York 01/07/82 30/06/92 2529 
S&P 500-share F Chicago (CME) 01/07/82 30/06/92 2529 
Coca Cola S New York 03/01/84 31/12/93 2529 
General Electric S New York 03/01/84 31/12/93 2529 
General Motors S New York 03/01/84 31/12/93 2529 
FT 100-share S London 02/01/85 30/12/94 2529 
FT 100-share F London 02/01/85 30/12/94 2529 
Glaxo S | London 04/01/82 31/12/91 2528 
Marks & Spencer S London 04/01/82 31/12/91 2528 
Shell S | London 04/01/82 31/12/91 2528 
Nikkei 225-share S Tokyo 07/01/85 30/12/94 2464 
Treasury bonds F Chicago (CBOT) 01/12/81 29/11/01 2528 
3-month sterling bills F London 05/01/83 31/12/92 2527 
DM/$ F Chicago (CME) 01/12/81 29/11/91 2529 
Sterling/$ F Chicago (CME) 01/12/81 29/11/91 2529 
Swiss franc/$ F Chicago (CME) 01/12/81 29/11/91 2529 
Yen/$ F Chicago (CME) 01/12/81 29/11/91 2529 
Gold F  NewYork(COMEX) 01/12/80 30/11/90 2522 
Corn F Chicago (CBOT) 01/12/80 30/11/90 2528 
Live cattle F Chicago (CME) 01/12/80 30/11/90 2529 
3.54 
Ny US 
3.0 ag | 


1985 1987 1989 1991 


Figure 2.7. Relative share index levels. 

100-share index and the Nikkei 225-share index. These two series do not incorpo- 
rate any dividends, so that the numbers we then call returns are simply logarithmic 
price changes. There was trade on some Saturdays at the Tokyo Stock Exchange 
until February 1989 but these days are excluded from our time series. 

Figure 2.7 is a time series plot of the three indices, from January 1985 to June 
1992, with the indices scaled to begin at the same number and denominated in 
their domestic currencies. All the indices fell sharply in October 1987 and there 
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Figure 2.8. Payoffs from US stock investments. 


is a global factor that explains the covariation between them. The correlation 
between US and UK index returns on the same day equals 0.46. It equals 0.36 for 
the UK and Japan, but only 0.19 for the US and Japan. The US/Japan correlation 
increases to 0.37 when the American return is correlated with the Japanese return 
for the previous day. 

General Electric, Coca Cola, and General Motors were all among the ten largest 
firms at the New York Stock Exchange, ranked by market capitalization at the end 
of 1993. The returns include dividend payments. Figure 2.8 shows the result of 
investing one dollar in each of the companies at the beginning of 1984, plotted on 
a logarithmic scale. The correlations between returns on the same day are fairly 
substantial. They equal 0.60 (CC, GM), 0.51 (GE, GM) and 0.45 (CC, GE). 

Glaxo, Shell, and Marks & Spencer have been three of the largest companies 
whose shares are traded at the London Stock Exchange. The market prices are 
for settlement on one of twenty-four dates per annum. The returns have been 
calculated from spot prices implied by market prices, short-term interest rates, 
and the assumption of no arbitrage profits; they include dividends. The correlations 
between returns from different stocks on the same day for the three UK pairings 
are 0.33, 0.35, and 0.37. 


2.6.2 Futures Series 


Each futures series is obtained from a set of forty or more contracts. These con- 
tracts have delivery dates in March, June, September, and December, unless stated 
otherwise. Returns are calculated from the contract nearest to delivery, except in 
delivery months when the second-nearest contract is used. For example, a March 
contract provides returns from the first trading day in December until the last 
trading day in February; the return for the first day in December is calculated 
from the March contract prices on the last day of November and the first day of 
December. 
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Figure 2.9. Cumulative returns from currency futures. 


Eight of the eleven futures series are for financial futures, reflecting the domi- 
nance of this sector of the futures industry. The S&P 500 and the FTSE 100 stock 
index futures contracts provide returns that can be compared with returns from 
the underlying spot indices. The impossibility of easy arbitrage profits guarantees 
that spot and futures returns are similar. The correlation between spot and futures 
returns for the same day is therefore high. It equals 0.94 for the S&P 500 data. The 
strong dependence between the returns diminished during the crash of October 
1987, when the spot and futures returns for the S&P 500 were (—0.20, —0.34) on 
Monday, 19 October, (0.02, 0.07) on 20 October, and (0.08, 0.18) on 21 October. 

Bill and bond futures contracts provide returns on interest-rate products at 
opposite ends of the term structure. The short-term sterling contracts are for 
delivery of three-month UK Treasury bills. The long-term dollar contracts are 
designed around the delivery of US Treasury bonds with more than fifteen years 
to maturity. 

Currency futures prices are stated as the dollar price of one unit of foreign 
currency. The currency return series, for Deutsche mark, sterling, Swiss franc, 
and yen futures, contain returns from an American perspective. Figure 2.9 shows 
numbers that relate to a one dollar “investment” in long futures positions; after t 
periods, the number plotted is exp(r1 + r2 +---+7;). A common factor can be 
seen that reflects the relative strength or weakness of the dollar. The correlations 
between returns from different currencies on the same day are substantial. They 
range from 0.57 for sterling/$ and yen/$ returns to 0.92 for DM/$ and franc/$ 
returns. 

Commodity futures prices provide the returns for the remaining three series. 
Gold, corn, and live cattle contracts have been some of the most actively traded 
nonfinancial contracts. The delivery months for gold and cattle are the even months 
(February, April, ..., December), while the months for corn are March, May, July, 
September, and December. 
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Stochastic Processes: 
Definitions and Examples 


Prices and returns are modeled throughout this book by time-ordered sequences 
of random variables, called stochastic processes. This chapter reviews their defini- 
tions and properties. A key property is the level of correlation between variables 
measured at different times. We cover processes that exhibit a variety of cor- 
relation patterns, including several processes that possess no correlation across 
variables. 


3.1 Introduction 


A time series of returns is a single sample that arises from competitive trading at a 
market. We are interested in probability models that can explain the data that we 
see and that can then be used to make predictions. These probability models are 
called stochastic processes. They are sequences of random variables arranged in 
time order. This chapter reviews theoretical results. It is followed by data analysis 
in subsequent chapters that identifies and estimates appropriate models for prices 
and returns. 

Stochastic processes that are important for finance research are often identical 
or similar to processes that are used to model time series arising in economics and 
the physical sciences. Some of our definitions need to be more precise than those 
used elsewhere. In particular, we have to avoid assuming that distributions are 
normal and we must distinguish between independence and a lack of correlation. 

Properties of random variables are reviewed in Section 3.2. Finding a satis- 
factory model typically requires these properties to remain unchanged as time 
progresses. This leads us to the concept of a stationary stochastic process, which 
is covered in Section 3.3. The simplest example of a stationary process is a set of 
variables that have independent and identical distributions. This type of process 
and many others have no correlation between different terms from the process. 
We review a variety of uncorrelated processes in Section 3.4. 

More general stochastic processes, whose terms are correlated, are constructed 
from functions of uncorrelated variables. Autoregressive, moving-average, and 
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mixed processes are introduced in Section 3.5 and their relevance is motivated by 
theoretical examples for returns in Section 3.6. Integrated processes are defined 
in Section 3.7 and then fractional integration is described in Section 3.8. Stronger 
assumptions and their shortcomings for financial models are explained in Sec- 
tion 3.9. Two stochastic processes that are defined for a continuous range of times 
are mentioned in Section 3.10. 


3.2 Random Variables 


Anyone reading this book on a Monday will not know the closing level of a 
stock market index on the following day, assuming the market is open on that 
Tuesday. Thus, on Monday, we may regard the closing level on Tuesday as a 
random variable X. A day later we will discover the actual Tuesday closing level, 
or outcome, which will be some number x. On Monday we will, however, have 
some relevant information about possible outcomes on Tuesday and so we will 
be able to talk about a probability distribution for X. We now review standard 
definitions and results for random variables and probability distributions, focusing 
on the material that is most used in this book. 


3.2.1 One Variable 


For any random variable X, with possible outcomes that may range across all 
real numbers, the cumulative distribution function (c.d.f.) F is defined as the 
probability of an outcome at a particular level, or lower, i.e. 


F(x) = P(X < x) 


with P (-) referring to the probability of the bracketed event. Some random vari- 
ables are discrete, such as the number of trades during a period of trading. In con- 
trast, other variables have one or more continuous ranges of possible outcomes. 
Prices and returns are very often modeled by continuous random variables, even 
though markets prescribe minimum price changes. 

Discrete random variables have either a finite or a countable infinite number 
of possible outcomes. The probability distribution function f then states the 
probabilities of the possible outcomes and we have 


oo x 
fo)sPU ex» Zerf, Do fH. FE] » fo. 
X-——00 n=— 00 
Furthermore, F is not a differentiable function of x. Also, f (x) = F(x)—F(x—1) 
when the only possible outcomes are integers. 
Most of the random variables that we consider are, however, continuous and 
their cumulative functions are also differentiable. The density function (d.f.) f is 


then 
fo) - 3E 
^ dx’ 
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with 
oo x 
f(x) 2 0, f f(x) dx = 1, ro - f f (t) dt. 
—oo —oo 


The probability of an outcome within a short interval from x — 56 to x + 56 is 
then approximately ôf (x), while the exact probability for a general interval from 
a to bis given by 


b 
P(a < X <b) = F(b) - F(a) = f f(x) dx. 


The expectation or mean of a continuous random variable X is defined by 


oo 
ex) = f xf (x) dx. 
OC 
Here, and in many of the following equations, the equivalent definition for a 
discrete variable is given by replacing the integral by a sum. 

Any function Y — g(X) of arandom variable X is also a random variable. The 
expectation of Y can then be found either from its own density function or, as 
follows, from the d.f. of X: 


Ele(X)] = f oris: 


An important example is the expectation of the squared distance from the mean, 
now denoted by ju. This defines the variance of X, often denoted by o?: 


£g) = (X — HI, 


var(X) = E[gQ0] = 0° = f "um FG) dx, 


which leads to the alternative expression 
var(X) = E[(X?] — E[XP. 


For all numbers a and b, the random variable a + bX has expectation a + bE[X] 
and variance b? var(X). 

The standard deviation of X is simply the square root of its variance. The nth 
moment equals E[X"] and the nth central moment is m, = E[(X — ul The 
second central moment is another name for the variance. As becomes clearer later, 
of particular importance for us are the skewness and kurtosis defined by 


skewness(X) = m3/m4> and  kurtosis(X) = ma/ m$. 
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3.2.20 The Normal Distribution 


The normal (or Gaussian) distribution is our most important continuous distribu- 
tion. It is defined by the density function 


fj exp ( AER (3.1) 
oi p 2 [^] i ` 


This density has two parameters: the mean u and the variance c? of the random 
variable. We use the notation X ^ N(y, o?) when X has the above density. 

A linear function of a normal variable is also normal. If X ^ N(u, o?) and 
Y = a + bX, then Y ~ N(a + by, b2o?). In particular, with a = —j/o and 
b= lier, 


X~ Níu, o’) > Z= 


X = 
iN OA): 
oO 


We call Z the standard normal distribution. Its d.f. is simply 


1 
fee DE exp(—42°), (3.2) 


and we may denote its c.d.f. by ®(z), which has to be evaluated by numerical 
methods. The probabilities of outcomes for X within particular ranges can be 


calculated from 
i= _ 
Pa « x <b)=0/ J ei? £), 
o o 


The density of the normal distribution is symmetric about its mean u. Symmetry 
ensures that all the odd central moments are zero and therefore the skewness of 
the distribution is zero. The second and fourth central moments are respectively 
o? and 36^, so that all normal distributions have a kurtosis equal to three. 

Exponential functions of normal variables are often encountered in finance. 
The general result for their expectations is 


E[e"*] = expun + Iu?o?). (3.3) 
This applies for all real and complex numbers u. 


3.23 The Lognormal Distribution 


A random variable Y has a lognormal distribution whenever log(Y) has a normal 
distribution. When log(Y) ^ N (n, o2), the density function of Y is 


— ( 1 ee - E = 
fO) = 4 yoV20 F 2 " P UE (3.4) 
0, y <0. 
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From equation (3.3), E[Y"] = exp(nu + n?o?) for all n. Consequently, the 
mean and the variance of Y are 


E[Y] =exp(u + 10?) and var(Y) = exp(2u + o°)(exp(0°) — 1). 


The mean exceeds the median, namely exp(u), reflecting the positive skewness 
of this nonsymmetric distribution. 


3.2.4 Two Variables 


Two random variables X and Y have a bivariate c.d.f. that gives the probabilities 
of both outcomes being less than or equal to levels x and y respectively: 


F(x, y) 2 P(X € x and Y < y). 
The bivariate d.f. is then defined for continuous variables by 


3? F 
ə3xðy 


fA, y= 


We are often only interested in conditional densities, such as the density of Y 
when we know that the outcome for X is a particular number x. Let fx (x) now 
denote the density of X. Assuming fx(x) > 0, we adopt the notation f(y | x) 
for the density of Y conditional on the event X — x, and its definition is 


fo 1x) = fæ, y)/fx@). 


The conditional expectation of Y given x is then 


EU |x] =f yfG 1x) dy. 


We will also use the notation E[Y | X] to refer to the random variable whose 
outcome is defined by E[Y | x] when the outcome of X is x. If we want to 
emphasize the distinction between E[Y] and E[Y | x], then the first term may be 
called the unconditional expectation. 

The covariance between two variables is one measure of linear dependence 
between them. Using subscripts to indicate the variable, 


cov(X, Y) = cov(Y, X) = E[(X — ux)(Y — uy)] 
= E[XY] — E[X]E[Y]. 


Note thatthe special case X — Y showsthatcov(X, X) — var(X). Thecorrelation 
is another measure of linear dependence, 


cov(X, Y) 


OXOY 


cor(X, Y) — 
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This is often denoted by the symbol p and —1 < p < 1. The correlation, unlike the 
covariance, is essentially unchanged by linear transformations. For all numbers 
a, b, c, and d, 


cov(a + bX,c+dY) = bd cov(X, Y) 
and, whenever bd > 0, 
cor(a + bX,c 4- dY) = cor(X, Y). 
The mean and variance of the sum of two random variables are 
E[X + Y] = E[X] + E[Y] 
and 
var(X + Y) = var(X) + 2cov(X, Y) + var(Y). 


More generally, for any numbers a, b, and c, the first two moments of a linear 
combination are 


E[a 4- bX -- cY] = a - bE[X] + cE[Y] 
and 

var(a + bX + cY) = D? var(X) + 2bc cov(X, Y) + c? var(Y). 
3.25 Two Independent Variables 


Two random variables X and Y are independent if and only if the conditional 
densities of one variable (say Y) all equal that variable's unconditional density: 


f(y | x) = fr) whenever fx (x) > 0. 


This is equivalent to the factorization of the bivariate d.f. into the product of the 
two unconditional densities: 


f(x,y) = fx(x) fy(y) forall x and y . 


Variables are dependent if they are not independent. 
Independence has many implications. First, conditional and unconditional ex- 
pectations are then equal: 


E[Y | x] = E[Y] forall x that have fx(x) > 0. 


Second, the variables g (X) and h(Y) are also independent for all functions g and 
h. Third, 
E[XY] = E[X]E[Y] 


and hence the correlation and covariance between independent variables are both 
Zero. 
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Although independence implies zero correlation, the converse does not follow. 
There are many examples in this book of dependent variables that have zero 
correlation between them. Our first example is provided by the ARCH(1) model 
described later by equation (3.11). 


3.2.6 Several Variables 
A general set of n variables (Y;, Yo,..., Yn} has the multivariate c.d.f. defined 
by 
F(y1, y2,--+ yn) = PO & y1, Yo S y», ..., Yn S Yn). 

The multivariate d.f. for continuous variables is 

ə” F 
y1 IYn 
It is often stated as the product of the first unconditional density and n — 1 
conditional densities: 


fOr ---, Yn) = 


fou yz... Yn) 
= fy, OD f O2 | yf O3 | y1, y2) FfOn | Y1, Y2; <- -s Yn=1)- 


The first two moments of the general linear combination are 


ele oan = a+) b ELY.] (3.5) 
i=l i=l 


and 
n n n—-1 n 
var (« + Lar) =X b va(Y) 35 3. 2bibjcov(Y;, Yj). (3.6) 
i=l i=l i=l j=i+1 


The combination a + Y b; Y; has a normal distribution when the component vari- 
ables have a multivariate normal distribution. This distribution has d.f. 


fQ) = exp(—3(y — IO te i) (3.7) 


1 
(2x: )"/? /det(42) 
for vectors y = (y1, ..., yn). U = (H1, ..., Hall, With u; = E[Y;], and a matrix 
Q that has elements given by cj, ; = cov(Y;, Y;) and a determinant denoted by 
det(42). 
3.2.7 Several Independent Variables 


Random variables are independent if information about the outcomes of some 
of the variables tells us nothing new about the distributions of the remaining 
variables. The d.f. then factorizes as 


FOL yn...» fn OD fr (92) > ++ fv, On). 
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An immediate consequence of independence is that all covariances between dif- 
ferent variables are zero and hence equation (3.6) simplifies. In particular, the 
variance of the sum of n variables equals the sum of the n individual variances. 

In many probability models, the unconditional distributions of independent 
variables are assumed to be identical. There is then a common univariate d.f., say 
fv Cy), and the multivariate d.f. is simply 


FOL yz...» = fY fro) fro. 


The n variables (Y;, Y2,..., Y,} are then said to be independent and identically 
distributed (i.i.d.). An infinite number of variables are said to be i.i.d. if all finite 
subsets have the i.i.d. property. 

The average Y, = E ek Y,,)/n of a “large” number of i.i.d. variables has 
an approximate normal distribution for any d.f. fy (y) that has finite mean and 
variance, say u and 0”. More precisely, the central limit theorem states that the 
distribution of (Y, — u) / (o / /n) converges to the standard normal distribution 
as n increases. 


3.3 Stationary Stochastic Processes 


A stochastic process is a sequence of random variables in time order. Sometimes it 
is called the process generating observed data or, more simply, either a process or a 
model. A stochastic process is often denoted by a typical variable in curly brackets, 
e.g. (X,) with t representing time. Almost all of our examples are for integer times 
and sometimes an infinite timescale is used. In due course we consider processes 
for prices, returns, and measures of volatility. 

A time-ordered set of observations, (x1, x2, x3, ... , Xn}, is called a time series. 
Much of this book is about methods for inferring and estimating the properties 
of the stochastic process that generated a time series of returns. It is of particular 
interest to describe the distribution of X; conditional upon the historical record 
until time ¢ — 1. 

Several categories of stochastic processes are defined in the following pages. 
A summary of their definitions is provided by Table 3.1. 


3.3.1  Stationarity 


Stochastic processes are often defined by either multivariate or conditional distri- 
butions. These distributions will depend on a set of parameters that may change 
through time. Parameter estimation from a time series is only feasible when the 
number of parameters does not exceed the number of observations. Estimation 
becomes much simpler when all the parameters remain constant as time pro- 
gresses. This requires that the distributions of random variables remain fixed as 
time progresses. 
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Table 3.1. Definitions of ten types of stochastic process. 


A process is... If... 


Strictly stationary The multivariate distribution function for k consecutive 
variables does not depend on the time subscript attached 
to the first variable (any K). 


Stationary Means and variances do not depend on time subscripts, 
covariances depend only on the difference between the 
two subscripts. 


Uncorrelated The correlation between variables having different time 
subscripts is always 0. 


Autocorrelated It is not uncorrelated. 

White noise The variables are uncorrelated, stationary and have mean 
equal to 0. 

Strict white noise The variables are independent and have identical distribu- 
tions whose mean is equal to 0. 

A martingale The expected value of variable t, conditional on the infor- 
mation provided by all previous values, equals variable 
t — 1. 


A martingale difference The expected value of variable t, conditional on the infor- 
mation provided by all previous values, always equals 0. 


Gaussian All multivariate distributions are multivariate normal. 


Linear It is a linear combination of the present and past terms 
from a strict white noise process. 


A stochastic process (X;) is strictly stationary if the multivariate, cumulative 
distribution functions of (X;, Xj+1,..., Xi+k—-1) and (Xj, Xj4i,..., X jas) 
are identical, for all integers i, j and for all k > 0. 

A special example of a strictly stationary process is given by supposing returns 
have independent and identical normal distributions. All multivariate distributions 
are then determined by the mean and variance parameters of the identical distri- 
butions. An example of a process for daily returns that is not strictly stationary is 
given by independent normal distributions whose variances depend on the day of 
the week. 

It is only practical to check the stationarity of some of the properties of a 
stochastic process. Suppose {X;} is strictly stationary. Then X; and X; have 
identical distributions and hence their expectations and variances are all equal to 
constant values, u = E[X;]and o? = var(X,). Also, because the pairs (X;, Xj+r) 
and (X j, X j+r) have identical bivariate distributions, the autocovariances 


he = cov(X;, Xt4r) = EX: — w)(Xr4r — AH (3.8) 
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only depend on the time interval c, or lag, between the two times ¢ and t + Tt. 
When t = 0,49 = oi, 

The first- and second-order moments of a stochastic process are its means, 
variances, and covariances. If these moments do not change with time, then the 
stochastic process has various names in the statistical literature: second-order sta- 
tionary, covariance stationary, and weakly stationary. All these phrases will be 
abbreviated to the single word, stationary. Any process that is not stationary is 
called nonstationary. It is assumed throughout this chapter that stationary pro- 
cesses have finite first- and second-order moments. Processes for which these 
moments are not defined are discussed in Section 4.8. 

Many credible models for returns are stationary. Equity prices and exchange 
rates, however, are not characterized by stationary processes (Baillie and Boller- 
slev 1994). Some of the evidence for this general conclusion comes from unit root 
tests (e.g. Baillie and Bollerslev 19892). The conclusion should not be surprising. 
Inflation increases the expectations of future prices for many assets. Thus the first 
moment changes. Deflating prices could provide constant expected values. Even 
then, however, the variances of prices are likely to increase as time progresses. 
This is always the case for a random walk process. If P; represents either the price 
or its logarithm and if the first difference Z; = P; — P; has positive variance 
and is uncorrelated with P ;, then 


var(P,) = var(P;.., + Z;) = var(P;—1) + var(Z;) > var(P;—1), 


so that the variances depend on the time t. The real spot prices of commodities 
might be stationary but the corresponding futures prices are theoretically non- 
stationary (Samuelson 1965, 1976). 


3.3.2 Gaussian Processes 


The random variables defining a stationary process have a general probability 
distribution. A process is called Gaussian if the multivariate distribution of the 
consecutive variables (X;+1, X;42,..., Xt+k) is multivariate normal for all inte- 
gers ¢ and k. A stationary Gaussian process is always strictly stationary, because 
then the first- and second-order moments completely determine the multivariate 
distributions. 

Although returns are certainly not generated by a Gaussian process, it will be 
shown in Chapters 9-11 that interesting and useful models for returns can be 
constructed from one or more stationary Gaussian processes. 


3.3.3 Autocorrelation 


The correlation between two random variables X; and Ke, whose process is 
stationary, is called the autocorrelation at lag t. The notation p; is used for this 
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correlation. As the variances of X, and X;+r both equal Ao, 
Pr = COV(X;, Xt+r)/ào = Ae dAn. (3.9) 


Then oo = 1 and e, = p. ,. As pr is a correlation, —1 < o, < 1. 

The notation o; can also be used for nonstationary processes when the corre- 
lation between X; and X;+r depends on t alone. For example, the variables X; 
could have time-dependent autocovariances yet have autocorrelations determined 
solely by the time lag between variables. 

An important property of the autocorrelations of a stationary process is that 
they determine optimal, linear forecasts, where optimal is defined as minimizing 
the mean square error of a forecast. For example, suppose 


oo 
fir =U+5+ SE Bes =y) 

i=0 
is a linear forecast of X;4; made at time t, with u = E[X;], ô, Bo, B1, ... being 
constants. It can then be shown that the mean square error, E| (X41 — fat 
equals 57 plus A multiplied by a function of the terms fj; and p+. Thus the optimal, 
linear forecast is unbiased (ô = 0) with the best Bj, i > 0, determined by the 
sequence pr, t > 0. 


3.3.4 Spectral Density 


The autocorrelations o; and the variance Ao conveniently summarize the second- 
order moments of a stationary process. An equivalent representation of these 
moments is provided by the spectral density function, but it will receive far less 
attention. It is the function of the frequency w defined by 


s(@) = a f EK? costo | (3.10) 
t=1 


The integral of ste) from 0 to 27 equals An, High values of s (w) might indicate 
cyclical behavior with the period of one cycle equal to 277/w time units. In finan- 
cial applications, the frequency-domain function s(@) is often more difficult to 
estimate and interpret than the time-domain sequence à+. Consequently, this text 
concentrates on time-domain methods. 


3.4 Uncorrelated Processes 


The simplest possible autocorrelations occur when a process is a collection of 
uncorrelated random variables, so 


po=1 and p,=0 forallt > O. 
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Figure3.1. Relationships between categories of uncorrelated processes. An arrow pointing 
from one category to another indicates that all processes in the former category are also in the 
latter category and the converse is false: some processes in the latter category are not members 
of the former category. It is assumed that all processes have finite means and variances. 


Any such process, whether stationary or nonstationary, will be called uncorrelated. 
The optimal linear forecast of X;+ 1 is then simply its unconditional mean. The 
adjective autocorrelated will be used if a process is not uncorrelated. 

Uncorrelated processes are often components of models for asset returns, 
because they are sometimes supported by empirical evidence and, in some cir- 
cumstances, by the theory of efficient markets. Three categories of uncorrelated 
processes are of particular importance, namely, white noise processes, strict white 
noise processes, and martingale differences. These are all zero-mean processes, 
i.e. u = 0. Figure 3.1 summarizes the relationships between the various categories 
of zero-mean, uncorrelated processes that are discussed in this section. 

A process is white noise if it is stationary, uncorrelated, and has zero mean. 
Its spectral density function is the same constant for all frequencies w, hence all 
frequencies contribute equally to the spectrum just as all colors contribute equally 
to white light. 

Our definition of white noise is also used by Hamilton (1994). The defini- 
tion states less assumptions than are found in some texts (e.g. Tsay 2002). The 
absence of correlation from a white noise process does not imply independence 
between variables. The stronger assumptions that the variables are independent 
and identically distributed (1.1.d.), with zero means, defines strict white noise 
(SWN). Gaussian white noise is strict white noise because uncorrelated variables 
are independent variables when their multivariate distribution is normal. 

The distinction between white noise and strict white noise is important when 
considering the non-Gaussian processes that are required to model returns. For 
example, a satisfactory process for exchange rate returns might have zero mean, 
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be stationary, and possess volatility clustering. Then the process is neither 1.1.d. 
nor SWN because information about recent volatility influences the conditional 
variance of subsequent returns. The process might, nevertheless, be white noise 
because the volatility information may be irrelevant for predicting the level of 
returns. 

Independent and identically distributed variables are the primary building 
blocks when constructing stochastic processes. Those white noise processes 
which are not i.i.d. can often be constructed from a transformation of 1.i.d. vari- 
ables. The ARCH(1) model of Engle (1982), which also has the property of 
volatility clustering, is a typical example. Let {n;} be a zero-mean, unit-variance, 
i.i.d. process and let 

X, = m(a LENZ IT, (3.11) 


Then (X;) is uncorrelated, because n; is independent of all variables with earlier 
time subscripts and hence E[X; X; .,] = 0 for all positive lags. The process is 
stationary for suitable positive choices of œ and £ and is then white noise. How- 
ever, the process is not i.i.d. because the conditional expectation E[X 2 | Xii] 
equals a + BX SCH and thus X, is not independent of X;. ,. 

Another category of uncorrelated processes is given by differencing a martin- 
gale. A process (M;) is a martingale, with respect to the information provided by 
its own history, if 

ETA, | Mi 1, Mi», ...] = Mu, (3.12) 


The differences X, = M; — M;-., then define a martingale difference process and 
have the fair game property: 


E[X; | Xp-1, Xi 2; ...] = 0. (3.13) 


Itfollows that a martingale difference (MD) is a zero-mean, uncorrelated process. 
Consequently, a stationary MD is white noise. Other MDs are nonstationary and 
thus are not white noise. 

White noise processes may not be MDs because the conditional expectation of 
anuncorrelated process can be a nonlinear function. Illustrations of this mathemat- 
ical result tend to be contrived when discussing models for returns. An example 
is the following bilinear model mentioned by Granger and Newbold (1986): 


X, = BX;-2€1-1 + & (3.14) 


with the residuals £; a unit-variance SWN process. This bilinear model is white 
noise when 0 < 6 < 1/4/2. Its conditional expectations (and hence optimal fore- 
casts) are the following nonlinear functions: 


oo i+l 
E[X; | Xii, X12... 2 — Ycex-(Tl -! 
i=l j=2 
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Figure 3.2. Examples of AR(1) autocorrelations. 


The weight attached to X;_; is then a random variable, unlike the constant term 
that appears in linear forecasts. 


3.5 ARMA Processes 


A white noise process {e+} is often used to construct a general autocorrelated 
process. Three examples of processes used to model returns and derived series 
are described. Afterwards, the general autoregressive, moving-average (ARMA) 
model is presented. The £; can have any distribution with finite-variance 82. They 
might be independent variables, but this restriction is not necessary and so it is 
not assumed. We will often use the following basic properties of white noise: 


E[s]— 0, E[s2]— o2,  E[eries:] =0 for all t and for all t £0. 


3.5.1 AR(1) 


First, consider a process (X;) defined by 
Xi— u = ó(Xi-1 Hl + 8&. (3.15) 


Then X, depends linearly on X,..; and the innovation (or residual) &; alone. 

The process {X;} is called an autoregressive process of order one, abbreviated to 

AR(1). The process is stationary if, as will always be assumed, the autoregressive 

parameter ¢ satisfies the inequality || < 1. The other two parameters of an AR(1) 

process are its mean, jj = E[X;], and variance, Ag = var(X;) = o2/ü — $2). 
An AR(1) process has autocorrelations 


P=, v20. (3.16) 


These autocorrelations decrease slowly when d is near one. Figure 3.2 illustrates 
the autocorrelations when ¢ is either 0.9 or 0.98. 

An example of a process that has been modeled as AR(1) is the logarithm 
of price volatility, to be considered in Chapter 11. Figure 3.3 shows a series of 
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Figure 3.3. Observations from an AR(1) process. 
annualized volatility values simulated from this process when d = 0.98, with 
Gaussian innovations, for 500 consecutive days. These observations can be far 
from the median level of 10% for long periods of time. 

Equation (3.16) for the autocorrelations, and many others, can be obtained by 
using the lag operator L, defined by La; = a;— for any infinite sequence of 
variables or numbers {a;}. Repeated application of the operator L gives LIN, = 
X, 1 and L* u = u for all integers k. Equation (3.15) can be rewritten as 

(1 — 6L)(X, — u) = er. (3.17) 


As || < 1, there is the result 
1 z 
——— = A (L) 
1—9L 2 
and therefore 
1 oo oo 
X;—u- rere = 3 erra = Xo d'eri. (3.18) 
i=0 i=0 


Thus X; is an infinite-order weighted average of the present and past innovations. 
It follows that e, and X,_, are uncorrelated whenever t is positive. The auto- 
covariances of the AR(1) process are given by multiplying both sides of equation 
(3.15) by X; — H, followed by taking expectations, to give 


Àr = Qàr-1 + E[er(Xi-: — Hl = bAr-1 = di Ao, tl, 


from which the autocorrelations are as stated above in equation (3.16). 
The optimal linear forecast of X;.,. y, made at time t, is given by 


fan =u +(X: — p). (3.19) 


This can be deduced from the more general result proved soon for the ARMA (1, 1) 
process. 
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Figure 3.4. Examples of MA(1) autocorrelations. 


3.5.2 MA(1) 


Second, consider a process {X;} defined by 
X; = U + & + 08.1. (3.20) 


Now X, is a linear function of the present and previous innovations. The process 
(X;) is called a moving-average process of order one, summarized by MA(1). 
The process is always stationary. It will be assumed that the moving-average 
parameter 0 satisfies the invertibility condition |9| < 1 and then optimal linear 
forecasts can be calculated. The other two parameters of an MA(1) process are 
its mean, u = E[X;], and variance, Ag = var(X;) = (1 + 0?)o2. 

The autocovariances of an MA(1) process are 


Àr = cov(X;, Xi) = Ef (er + 08:1) (8er + 0Et+r-1)], 


which are zero whenever 7 > 1, while Aj = 002. An MA(1) process thus has 
autocorrelations 


o os fort 22. (3.21) 


"Ce 
The jump to zero autocorrelation at lags two and higher contrasts with the geo- 
metric decay of AR(1) autocorrelations. Figure 3.4 shows the autocorrelations 
when 0 is either 0.1 or 0.25. The optimal linear forecasts are given by 


fiuc - fra) and fiw =u, N22 (3.22) 


Returns from stock indices are an example of a process that has often been 
modeled as MA(1) with 0 a small positive number. Higher levels of dependence 
occur for “returns” that are calculated from the monthly average of daily prices. 
Following Working (1960), a typical model for “returns” is then MA(1) with 
0 = 0.25. Figure 3.5 shows a series of "returns" simulated from this process, 
with Gaussian innovations, for 100 consecutive days. 
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Figure 3.5. Observations from an MA(1) process. 
3.533 ARMAQ, 1) 
Third, consider the combination of the AR(1) and MA(1) models defined by 


X; — u = O(Xr-1 — U) & + 061. (3.23) 


This mixed model is an autoregressive, moving-average process, which we refer 
to as ARMA(1, 1). It has often been used to model returns and specific examples 
are given in Section 3.6. Squared returns have also been modeled by ARMA (1, 1) 
processes, for example, when returns follow the GARCH(1, 1) model described 
in Section 9.3. 

It is assumed that 0 < |ó| < 1, 0 < |0| < 1, and $ + 0 Z 0, so that (X;] is 
stationary, invertible and not white noise. Once more the mean is ju. It is shown 
later in this subsection that the variance and the autocorrelations are given by 


T 1-240 +0? , 


br = A(6,0)9, v2 1, (3.25) 
with 
A($,0) = Eee? (3.26) 


pa + 260 + 67)’ 
assuming @ # 0. Like an AR(1) process, the autocorrelations at positive lags 
form a geometric progression, with pr+1 = pr when t > 1. However, unlike 
AR(1), the ARMA(1, 1) process has o 4 $ when 0 # 0. When 6 is positive and 
0 is negative it is possible for o; to be very small compared with d. 

Figure 3.6 displays two sets of autocorrelations when $ is near one and $ + 0 
is near zero, specifically for the parameter pairs (0.99, —0.95) and (0.9, —0.8). 
The former pair has been used to produce Figure 3.7, which shows a simulation 
of squared percentage returns obtained from the GARCH(1, 1) process. Volatil- 
ity clustering effects are clearly visible. The ARMA(I, 1) specification for the 
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Figure 3.6. Examples of ARMA(I, 1) autocorrelations. 
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Figure 3.7. Observations from an ARMA(1, 1) process. 


squared terms is given later, by equation (9.16); the innovations are white noise, 
but they are not i.1.d. 

Sometimes we wish to recover the moving-average parameter 0 when we are 
told the values of A and $. From (3.26), this requires a solution of the quadratic 
equation 

" [ + 0-249 
(1 — A)ó 
When d > 0, there is a unique solution within the interval [—@, 0], while for 
@ < O itis within [0, —@], in both cases assuming 0 < A < 1. 
The ARMA(1, 1) process can be rewritten using the lag operator as 


(1 — 6LY(X, — u) = (1+ 0 L)er. 


An infinite-order moving-average model is given by 


14-0L ESL SCH 
GE (och *ÓL)e = &r + WEE tes 


Jeri. (3.27) 


(3.28) 
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from which the formula for the variance of X, follows. Likewise, an infinite-order 
autoregressive model is given by 


1-@L — j 
& = Tgp Xr = -eD 2:000 (X; — p), 
which simplifies to 
X, =u = (6 +0) X (C6 Xii - i) + Er (3.29) 


i=l 
To obtain the autocorrelations, note from (3.28) that any product (X; — )é1—; 
has zero expectation if time t — Tt is before time t — j, i.e. t > j. Then consider 
multiplying both sides of (3.23) by X;-, u and taking expectations. When 
T > 2, this gives A, = ġàr-1, while t = 1 and t = O respectively give 


Ar =o +00; and do = ġà + (1+ $0 +6007, 


again making use of (3.28). Eliminating 02 from these simultaneous equations 
gives p1, and then p», p3, etc., follow from p2 = do, 03 = $p», etc. 

All forecasts of X;+1, chosen from linear combinations of X;, X;~1, ..., must 
have mean square error at least equal to the variance of the innovation ze). This 
lower bound is attained by replacing t by t + 1 in (3.29) and then substituting 
zero for zeit. The optimal linear forecast of X;+1 is thus 


oo 
fio ut O49) C0 7 ua - u). (3.30) 
i=l 
This forecast is a linear combination of the most recent variable and the forecast 
made one period earlier: 


fia uc G+ O)(X: — u) —-9Cficii — H). (3.31) 


This formula is a statement about the best linear forecast for a random variable. 
To forecast observed values we replace the parameters u, d, and 0 by estimates 
and replace the random variables X; and f;—1,1 respectively by an observed value 
and the previous observed forecast. 

To forecast an ARMA(1, 1) process further ahead, consider the following equa- 
tion obtained by repeatedly using the definition (3.23): 


N 


Xn -u= OG — i) t Y cieni N >I, 
i=l 


with each constant c; a function of $, 0, and N. As the optimal linear forecast 
of €;4; (i > 0) using variables X;_; (j > 0) is zero, it follows that the optimal 
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linear forecast of X;.. y made at time ¢ is a linear function of f; |. Denoting this 
optimal forecast by f; y, it is 


fan =u tél (fu - p). (3.32) 


The optimal linear forecasts for AR(1) and MA(1) processes can be deduced from 
(3.31) and (3.32) by respectively substituting 0 = 0 and $ = 0. 


3.54 ARMA(p, q) 


General ARMA processes contain p autoregressive and g moving-average param- 
eters. When p and q are both positive, we have the general mixed model, 


p d 

X, - m= 9  diQG-i — w) +} Ojej, (3.33) 
i=l i=0 

with 09 = 1, dp Æ 0, 6g A 0. The AR(p) model is the special case that has 

q = 0. Likewise, the MA(q) model has p = 0 and omits the first sum. When 

p = q = 0, the model is merely a constant plus white noise and can then be 

referred to as MA(0). 

The ARMA (p, q) process is stationary if all the solutions of iz + doz? + 
E pz? = 1 are outside the unit circle, |z| = 1, z here representing a complex 
number. The process is said to be invertible if optimal linear forecasts can be 
obtained, which requires all solutions of 1 + 61z + baz? +- + 0,24 = 0 
to also be outside the unit circle. Box, Jenkins, and Reinsel (1994) describe the 
pioneering Box—Jenkins methodology for selecting an appropriate ARMA model. 
These models have been used to describe many economic and financial time series 
(see, for example, Granger and Newbold 1986; Mills 1999; Tsay 2002). Most 
models fitted to data have p +q < 2, as in the three examples we have discussed. 


3.5.5 Aggregation of Models 


The sum of two independent ARMA processes is also an ARMA process. An 
example that often occurs in this book is given by the sum of an AR(1) process 
and an MA(0) process, say 
Zr — Xi t Y, 

with 

(1—9L)(X, — ux) — & and Y, — uy +m. 
Then 

(1 — 6L)(Z; — (ux + My) = & + (1 — bb), 
which suggests that (Z;) is an ARMA(1, 1) process, with autoregressive parame- 


ter ġ. By adding the autocovariances of the X- and Y -processes, it can be checked 
that the autocovariances of the Z-process are indeed those of an ARMA(I, 1) 
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process. The term A in equation (3.26) equals var(X;)/ var(Z;) and hence the 
moving-average parameter 0 can be obtained from (3.27). Further algebra estab- 
lishes that the terms (&;) defined by 


(1 — 6L)(Zi — (ux + wy) = (1+ OL)E; 


form a white noise process as expected. 

Likewise, it can be shown that the sum of two independent AR(1) processes 
is an ARMA(2, 1) process when the two autoregressive parameters are different. 
The most general result is shown in Granger and Newbold (1986) as 


ARMA(pi, q1) + ARMA(p2, q2) = ARMA (p, q) (3.34) 
with 
PS<pitp2 and q < max(pi 9 q2, p2 9 qi). 


Also, the inequalities can be replaced by equalities for most parameter configu- 
rations. 


3.5.6 Aggregation through Time 
The autocorrelations of multi-period returns are functions of the autocorrelations 
of single-period returns. Suppose the j-period sum process is defined by 
F= X j(t-1)41 doses Xi. 
Then the first-lag autocorrelation of (Y;), denoted by pd h can be shown to equal 


the following nonlinear function of the first 27 — 1 autocorrelations of (X;], 
denoted by pr: 


j 2j-1 . 

G) uat 251097 —T)pr 

pp = —— 
j-23346-0& 


When (X;] is an ARMA(I, 1) process for which o; = Aaf, firstly the above 
expression equals 


(3.35) 


npe. Ap — 9) »? 
1 ^ 70-9)? €-2A9[j — 0) - 12-7] 


and secondly the higher lag autocorrelations of (Y;) are 


(3.36) 


; AG 
oO? = p elt ) ux. 


Hence {Y;} is itself an ARMA(1, 1) process, with autoregressive parameter di. 
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3.6 Examples of ARMA(1, 1) Specifications 


We may anticipate from the aggregation results that autoregressive and moving- 
average terms will appear in models defined by the sum of variables that follow 
simpler models. In particular, ARMA(1, 1) processes for returns have been moti- 
vated by supposing the returns process is the sum of two independent processes, 
one of which is white noise. The autocorrelations of the sum can then be either 
positive or negative, depending on the assumptions that define the returns process. 


3.6.1 A Negative Autocorrelation Example 


Negative dependence among returns occurs in the models investigated by Fama 
and French (1988) and Poterba and Summers (1988). Market prices P; are sup- 
posed to differ from rational (or correct or fundamental) prices P* by temporary 
pricing errors. The error terms u; are defined by 


log(P;) = log( P^) + ur (3.37) 


and then 


for terms R; and RI that respectively represent market and rational returns. The 
first component of the market return is the rational response to fundamental infor- 
mation given by R7, which is assumed to be white noise from the theory of 
efficient markets. The second component is u; — u;—,. Assuming pricing errors 
are only temporary, it is plausible to assume that the process {u+} is AR(1) with 
a positive autoregressive parameter d. The first differences of an AR(1) process 
are an ARMA(1, 1) process, with autoregressive and moving-average parameters 
respectively equal to @ and —1. Therefore, the returns process is the sum of an 
ARMA(1, 1) process and independent white noise, which is also ARMA(I, 1) 
from the aggregation result stated in equation (3.34). 

The autocorrelations of returns depend on the proportion of returns variance 
that is due to incorrectly interpreted information, namely 


var(u; —u;i-1) _ 2(1 — d) var(ur) 
var(R;) ` var(R;) 


; (3.39) 


and on the persistence of the errors, measured by d = cor(u;, U;+1). AS 


COV(R;, Ri+r) = cov(ur — ur-1, Ut+r — Ut+r—1) 


= 7! (2¢ — 1 — °) var(u:) forall t > 0, 
these return autocorrelations are 


Pr = Ab", tèl, (3.40) 
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Figure 3.8. Simulated market and rational prices. 
with 
 (1-¢)B 
26 ` 
As B and ¢ are positive, A must be negative and hence positive persistence of the 
pricing errors implies negative autocorrelation at all nonzero lags. 

An extreme example is given by supposing B = 0.75 and ¢ = 0.998, so that 
most of the returns variance is due to very slowly corrected pricing errors. With ¢ 
counting trading days, the first autocorrelation of daily returns is tiny at —0.0008. 
However, the first autocorrelation is then substantial for long-horizon returns and 
equals —0.24 for returns measured over three years (using equation (3.36) with 
j = 750), which is near to an empirical estimate in Fama and French (1988). 
Figure 3.8 shows a series of simulated daily values of P, and P,* for a five-year 
period. The market prices are far above the rational prices for a long time during 
this simulation, although the pricing error is certain to change sign at some later 
time. 


A= (3.41) 


3.6.2 A Positive Autocorrelation Example 


A simpler model for returns, which possesses positive dependence, is given by 
separating returns R; into a “trend” component 7; and an independent white 
noise component e; (Taylor 1982a). The trend term is persistent and it represents 
either time-varying expectations or the market's response to slowly interpreted 
information. Then 

R;—T + & (3.42) 


and the proportion of the returns variance that is now explained by the persistent 


component equals 
T, 
A228), (3.43) 
var(R;) 
The simplest credible model for (7;) is an AR(1) process with a positive auto- 


regressive parameter $. The sum of the AR(1) and white noise components is an 
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ARMA(I, 1) process, from equation (3.34). As 
cov(R;, Rj 44) = cov(7;, fiel forall v Æ 0, 


the autocorrelations of returns have the same mathematical form as for the previ- 
ous model, namely 
pr = A9", tH 1. (3.44) 


Now, however, A is positive. A plausible example of the parameter values is 
A = 0.02 and ¢ = 0.95. The theoretical autocorrelations are then near empirical 
estimates for daily returns from currency futures in the 1970s and 1980s (Taylor 
1994b). 


3.7 ARIMA Processes 


The acronym ARIMA (p, 1, q) is used for a process ( X;] when it is nonstationary 
but its first differences, X, — X;~1, follow a stationary ARMA(p, q) process, 
defined by equation (3.33). The additional letter “I’ states that the process {X;} 
is integrated, while the numeral “1” indicates that only one application of differ- 
encing is required to achieve stationarity. 

For example, the pure I(1) process with p = q = 0 might be considered for the 
logarithm of prices, X, — log(P;). Then the first differences are returns, when 
there are no dividends, and they are simply a constant plus a white noise process: 


Xt— X;-1 = Ut e. (3.45) 


This process for the log-price has a random walk property, a concept that is defined 
more precisely in Section 5.2. 


3.8 ARFIMA Processes 


For each of our ARMA and ARIMA examples there is a filter that transforms the 
process (X;) into a constant plus a white noise process. For the AR(1), MA(1), 
and I(1) processes these filters are respectively 1 — @L, (1 — 0L) andl1—L 
respectively, with L the lag operator that was introduced after equation (3.16). 

Fractional integration is a property of stochastic processes that employ a more 
complicated filter, namely (1 — L)7 for some number d that is not an integer; 
typically, d is between zero and one. Applications of the fractional filter are 
difficult. They are referred to in a few sections of this book and are particularly 
interesting when modeling volatility. Readers who wish to concentrate on simpler 
models should skip the remainder of this section. 

The fractional filter is derived in Granger (1980) as a particular limit of an 
infinite sum of AR(1) processes that have different AR parameters. This limit is 
related to volatility dynamics and information flows by Andersen and Bollerslev 
(19972). The filter uses only one parameter to efficiently capture some features of 
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empirical data. It is defined by an infinite series expansion that commences with 
the terms 1 — dL + 4d(d — DL? 
An ARMA (p, q) process can be described compactly by the equation 


$(L)(X, — u) = 90e, 


where the filters @(L) and 0(L) are polynomials, respectively of orders p and 
q. The ARFIMA(p, d, q) process of Granger and Joyeux (1980) and Hosking 
(1981) also contains the fractional filter. When this process is stationary, with 
mean Zero, we can write it as 


a — L)“6(L)X; = O(L)er, (3.46) 
while for a general stationary mean p a precise definition is 
X, =n + 0 - Ly *$()) OL er. (347) 


This ARFIMA process is stationary when d < 0.5. Assuming d is positive, it is a 
special case of a long memory process. For an excellent review of long memory 
processes, see Baillie (1996). 

There are three essential differences between a stationary ARMA process and 
a stationary ARFIMA process that has 0 < d < 0.5. These are stated for the sum 
of the first n autocorrelations, the n-period variance ratio and the spectral density 
function, defined by 


—1 
eee c n — 
var(X; cb Eis ` 13235 UN 
t=1 


n 
A = n VR = 
k » 2 D n var(X,) 


n 


(3.48) 
and i 
SE H f -2Y]5 coso) |; 
t=1 
For a stationary ARMA process, the autocorrelations are geometrically bounded 
(so that |o;| < Cw" for some C > 0 and 1 > y > 0), which ensures the 
following limiting behavior: 


$,— Cj, WR, Co, s(o)— Ca, asn oo, o — 0, (3.49) 


for constants C1, C2, C3. In contrast, none of these limits exists for a stationary 
ARFIMA process. Instead, the autocorrelations have a hyperbolic decay rate, the 
variance ratio increases without limit, and the spectral density is unbounded for 
low frequencies. Specifically, 

Pr VR, s(@) 


Du nd D 23 — Da, asn —> oo, o — 0, (3.50) 


for positive constants D1, D2, D3. 
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3.9 Linear Stochastic Processes 
3.9.1 Definition 


Any stationary ARMA process equals a constant plus a moving average of white 
noise innovations {€+}, perhaps of infinite order. Equations (3.18) and (3.28) are 
examples that show AR(1) and ARMA(I, 1) processes can be represented by 
MA(oo) processes. In general, a stationary process always has a representation 


oo 
X,-2 ut 6j8.j (3.51) 
j=0 


for some set of constants 0; and some white noise process {£+}. The innovations 
will be uncorrelated, but they may not be independent variables. There will be a 
representation (3.51) for a stationary process if the X; have identical distributions 
and do not contain a deterministic component (Granger and Newbold 1986). 

Equation (3.51) is a statement that X, is a linear function of the uncorrelated 
variables {€,_;, j > 0}. Following statistical convention, this property of the 
process {X;} is not sufficient to call the process linear. The reason is that a process 
can be both a nonlinear function of i.i.d. variables and a linear function of white 
noise variables; the ARCH(1) model of equation (3.11) defines a simple example. 
The convention is to call such a process nonlinear because i.i.d. variables are 
the preferred building blocks when constructing models with interdependence 
between variables. 

A process {X;} is defined to be linear if it can be described by (3.51) with 
zero-mean innovations {£+} that are 1.1.d. Any process which is not linear is called 
nonlinear. A stationary, Gaussian process is linear. Also, a linear process is strictly 
stationary. 

The variance of a linear, invertible, ARMA process, conditional upon its past, 
is a constant: 

var(X; | X; 1, Xr—2, ....) = var (ez). 


Constant conditional variance is arguably the fundamental property that is not 
possessed by the process that generates returns. Periods of high price volatility 
and other periods of low price volatility oblige us to discard the idea of constant 
conditional variance and so we should expect satisfactory models to be nonlinear. 


3.9.2 Autocorrelation Tests 


The distinctions between white noise and zero-mean i.i.d. variables, and between 
linear and nonlinear processes, are extremely important for finance research. One 
important context is when sample autocorrelations are used to test the hypothesis 
that returns are uncorrelated through time. 

A time series of n returns can be used to estimate the autocorrelations o. of 
the process generating the observations, assuming the process is stationary. These 
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estimates can be interpreted as the realized values of random variables, denoted 
by pr. To perform random walk tests it is necessary to apply results about the 
distributions of the variables ô+. Asymptotic results are known for large samples of 
returns generated by a linear process. For the special hypothesis that returns have 
independent and identical finite-variance distributions, the distribution of ./n Ôr 
converges to NO. 1) as n — oo and thus the variance of ôr is approximately 
1/n (Anderson and Walker 1964). The conclusion about the variance of 6; is 
generally false for a nonlinear, white noise process (Taylor 1984). Such processes 
can have a higher variance for ô+. This means that if the sample autocorrelations 
of returns are judged to be significantly different from zero, by supposing their 
standard errors are 1/,/n, then it is wrong to reject the hypothesis that returns are 
uncorrelated using the same significance level. 


3.10 Continuous-Time Stochastic Processes 


Theoretical prices for options and other derivative securities are usually derived 
from stochastic processes defined for all times t on a continuous scale. Two 
important examples are introduced here, while more complicated processes are 
described in Chapters 13 and 14. 

A Wiener process {W (t)) commences at W (0) = 0. All of its increments have 
normal distributions. Whenever time ¢ is after time s, the increment W(t) — 
W (s) has a normal distribution, with mean zero and variance equal to the time 
difference t — s. Furthermore, increments over nonoverlapping time intervals are 
independent random variables. Thus, if tj < t2 < t3 < t4, then W (t2) — W(t) 
is independent of W (t4) — W (t3). The stochastic differential dW is a quantity 
which appears in continuous-time analysis. It is defined by the stochastic integral 
W(T) = f dW (s), with limits s = 0 and s = T. 

A geometric Brownian motion process {P (t)} is often represented by the cryptic 
equation 

dP/P = udt +0 dW. 


This equation describes the price process assumed in derivations of the Black— 
Scholes and related option pricing formulae. Itó's lemma, which is a stochastic 
calculus theorem, yields an equivalent equation for the process followed by the 
logarithm of P (t): 


d(log P) = (u — 1o?) dt + o dW. 
Integrating all terms from time zero until time f gives 
log P(t) = log P(0) + (u — 1o?)t +o W(t). 


Therefore, the change in the logarithm of the price from time s until a later time 
t has a normal distribution, with both the mean and the variance proportional to 
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the time difference t — s: 
log(P (1)) — log(P(s)) ~ Nun — 30?) — 5), o? (t — 5). (3.52) 


The discrete-time process defined by X; = log P(t) — log P(t — 1), for integers 
t, is i.i.d. and Gaussian. Thus geometric Brownian motion for prices implies that 
the logarithms of prices follow a random walk, with steps that have independent 
and identical normal distributions. 


3.11 Notation for Random Variables and Observations 


The distinction between random variables X, Y, Z,... and possible outcomes or 
Observations x, y, z, ... has been emphasized in this chapter by using uppercase 
letters for the former variables and lowercase letters for the latter variables. It is 
cumbersome to maintain this distinction and, consequently, it is now discarded. 
The notation r, for example, may now sometimes refer to an observed return 
and at other times it will refer to a random variable that models the probability 
distribution of a return. It should always be possible to infer from the context 
whether a symbol refers to a random variable or to an observation. 


4 


Stylized Facts for Financial Returns 


Several statistical properties of daily returns are documented and discussed in 
this chapter, before testing hypotheses and estimating time-series models in later 
chapters. These properties are presented for the means, variances, distributions, 
and autocorrelations of returns by referring to empirical evidence obtained from 
many datasets, including the twenty time series introduced in Chapter 2. This 
chapter ends by emphasizing that linear stochastic processes cannot explain all 
the empirical properties of returns. 


41 Introduction 


General properties that are expected to be present in any set of returns are called 
stylized facts. There are three important properties that are found in almost all 
sets of daily returns obtained from a few years of prices. First, the distribution 
of returns is not normal. Second, there is almost no correlation between returns 
for different days. Third, the correlations between the magnitudes of returns on 
nearby days are positive and statistically significant. These properties can all be 
explained by changes through time in volatility, as will be seen in Chapters 8-11. 
This chapter also covers many other statistical characteristics of daily returns. 
Those readers who only wish to learn about the stylized facts will find them 
discussed in Sections 4.7, 4.9, and 4.10. 

Incidentally, the three major stylized facts are pervasive across time as well 
as across markets. They are apparent in daily returns at the Florentine currency 
market from 1389 to 1432 (Booth and Gurun 2004), the London market for stocks 
from 1724 to 1740 (Harrison 1998), and the London fixed-income market from 
1821 to 1860 (Mitchell, Brown, and Easton 2002). 

The first part of this chapter is about features of the distribution of returns. 
After defining summary statistics in Section 4.2, the empirical means and standard 
deviations of returns are discussed in Sections 4.3 and 4.4. Average returns have 
also been estimated for calendar periods, such as all Mondays and all days in 
January. Calendar anomalies are generally ignored in this book after the detailed 
review provided in Section 4.5. 
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Some information about the shape of the distribution of returns is given by the 
skewness and kurtosis statistics that are discussed in Section 4.6. Comparisons 
with the normal shape in Section 4.7 show that the distribution of daily returns 
has more observations near the mean and more in the tails than are expected 
from a normal distribution. A survey of more appropriate distributions is given in 
Section 4.8. 

The second part of the chapter summarizes the dependence between returns on 
different days by autocorrelation statistics. The estimates for returns r; are impor- 
tant because they help to show that it is difficult to predict future returns using a 
linear combination of previous returns. These correlation estimates are discussed 
in Section 4.9. More striking results, however, are obtained in Section 4.10 by 
considering autocorrelations for transformed data, such as absolute returns |7;|. 
The correlation estimates are then positive and often fairly substantial for absolute 
returns separated by a few days. Returns are therefore dependent on the returns 
obtained on previous days, but the form of the dependence is not linear. Any sat- 
isfactory model for returns must be a nonlinear stochastic process, as is shown in 
Section 4.11. 


4.2 Summary Statistics 


The statistical characteristics of the distribution of a set of returns can be sum- 
marized by numbers such as their mean (7), standard deviation (s), skewness (b), 


and kurtosis (k). These statistics are defined for a set of n returns (r1, r2, ..., Tn} 
by 
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These summary statistics are presented in Table 4.1 for the twenty time series of 
daily returns introduced in Section 2.6. The table also includes the minimum and 
maximum returns, three columns that refer to annual returns, and a column that 
contains the test statistic 


e 
s/n 

The z-statistic is used to assess the null hypothesis that the expected return is zero. 

Several statistics discussed in this chapter are sensitive to extreme outliers in 
the returns data. The stock return series include such outliers around the crash 
on 19 October 1987. Consequently, all the returns in the week of the crash are 
excluded from the calculations of the skewness and kurtosis statistics shown in 
Table 4.1. The minimum stock returns all occur on either Monday 19th or Tuesday 
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Table 4.1. Summary statistics for time series of returns. 


Series 104; 10?s b k G% A% A* 96 z 


S&P 500-share S 642 0.8 —0.67 10.44 17.62 19.23 19.05 — 3.30 
S&P 500-share F 3.60 135 —0.55 10.10 9.53 10.85 12.08 = 1.34 
Coca Cola S 11.67 169 0.08 5.68 34.33 3689 39.30 3.46 
General Electric S 742 1.51 0.03 543 20.65 22.35 24.17 248 
General Motors S 5.58 1.76 0.13 456 15.16 17.45 19.77 1.59 
FT 100-share S 3.60 0.97 —0.19 5.94 9.55 10.47 10.86 1.87 
FT 100-share F 144 112 -—023 5.79 3.72 4.52 5.38 | 0.65 
Glaxo S 14.73 1.7.9 0.33 6.93 45.15 5430 51.16 4.14 
Marks & Spencer S 7.25 166 0.003 440 20.14 23.30 24.39 220 
Shell S 7.63 130 0.23 518 2129 23.14 23.91 2.95 
Nikkei 225-share S 21417 133 0.35 10.14 5.50 8.75 7.82 0.81 
Treasury bonds F 2.73 0.78 0.09 4.61 7.14 7.65 7.97 1.75 
3-month sterling bills F —0.52 0.16 2.29 59.84 1.31 1.28 1.28 —1.64 
DM/$ F 021 0.74 0.27 5.19 0.53 1.61 123 0.14 
Sterling/$ F 0.60 0.76 0.28 5.71 1:53 3.13 227 0.40 
Swiss franc/$ F —0.54 0.82 022 457  —1.35 0.14 —0.52 —0.33 
Yen/$ F 0.85 0.68 0.37 6.66 2.18 3.24 2.78 | 0.63 
Gold F —5.35 1.33 —0.06 6.70 —12.63 —10.96 —10.66 —2.02 
Corn F —3.99 1.20 —0.14 6.36 9.59 6.98 7.92 —1.66 
F 


Live cattle 2.87 0.99 —0.13 3.37 7.52 8.79 8.87 1.45 


S and F respectively indicate spot and futures returns. The sample sizes n are between 2460 and 2560 
and are listed in Table 2.2. r, s, b, and k are the mean, standard deviation, skewness, and kurtosis for a 
sample of returns, as defined in Section 4.2. The crash week, commencing on Sunday, 18 October 1987, 
is excluded when the stock skewness and kurtosis figures are calculated. z = y/n F /s. The average annual 
return estimates G, A, and A* are defined in Section 4.3. 


20th ofthe crash week and a majority ofthe maximum returns occur on Wednesday 
21st. 


4.3 Average Returns and Risk Premia 
4.3.1 Annual Averages 


The average return r over one day is, of course, very small and it is often more 
practical to discuss averages over longer periods, particularly one year. Three 
annual average measures are listed in Table 4.1 under the headings G, A, and A*. 

To motivate these measures, we represent the wealth of a typical investor by 
wy at time f£, with w; = w;- 1 exp(r;). This is appropriate if an investor reinvests 
dividends and so holds a quantity q; of the asset defined recursively by q; — 
qi—1 (1 + d;/ pt), with d; the dividend paid out in period t (usually zero) and p, 
the price of one unit of the asset. Then w; — q; p;. The quantities p; and w; will 
be identical when an asset never pays a dividend and qo — 1. 

Suppose a time series provides returns for T years with N = n/T return 
observations per year, on average. The constant annual return G that gives the 
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same overall return solves 


040 22 
wo 


with wo and w; respectively the initial and final levels of wealth. Note that the 
average return can be calculated directly from these levels, as 


1 
P Hot, — log(wo)] 


and thus 
G = exp(Nr) — 1. (4.2) 


Investors will also be interested in the annual expected return during a future 
year, say from time n until time n + N. The simple annual return is then 


N 
w —w 
R= ies la exp Kg — 1. 
Wn 
h=1 
One estimate of E[R] is given by averaging simple annual returns, here denoted 
by rJ). For an integer value of T, the obvious estimate of E[R] is the arithmetic 


mean 
1 : WNj — UWN(j-1) 1 T 
A= J I rO, (4.3) 
T 2. UN(-1) T 2. 


Note that 1 + G is then the geometric mean of the T terms 1 + r^, This implies 
G is a downwards biased estimate of E[R], because G is less than the arithmetic 
mean A, which is unbiased. 

The sum of a year of N consecutive returns, ` r4.4.5, is approximately normal 
from a version of the central limit theorem, when the returns process is stationary 
and uncorrelated, say with mean jz and variance o?. The expected simple annual 
return is then 

E[R] = exp(Nu + 2No?) — 1. (4.4) 


This approximation suggests estimating E[R] by 
A* = exp(N7 + Ns?) — 1 = (1+ G)exp(4 Ns?) — 1. (4.5) 


This estimate has much less bias than G when the assumptions are valid. The 
estimate could be useful when F and s are known but the annual terms r(/? are not 
available. However, A* could be seriously biased if the returns are autocorrelated, 
because then var(Y ^ rnin) # No?. 

Merton (1980) has shown that estimates of annual expected returns must be 
inaccurate because prices are too volatile to permit accurate estimates. To illus- 
trate the problem, consider the estimate G when ten years of independent daily 
returns are generated by a normal distribution whose mean and standard devi- 
ation are given by the tabulated estimates for the spot S&P 500 index. Then 
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u = 6.42 x 107^, o = 0.0098 and E[R] = 19% for a year of 250 trading days. 
With a 95% probability, 250r is within the interval 0.160 + 0.096, and thus a 95% 
probability interval for G = exp(250r) — 1 is from 7% to 29%. 

Table 4.1 shows the values for G, A, and A* in percentage units. These figures 
underestimate average returns for the spot FTSE and Nikkei indices because 
dividends are excluded from the calculations. The geometric mean return" G is 
typically 1-2% per annum less than the “arithmetic mean return" A. The S&P 
500 spot returns should exceed the futures returns by a figure similar to the risk- 
free rate of interest, as capital is only required for spot investments. The actual 
difference is about 8% per annum. The FTSE 100 spot average should be greater 
than the futures figure by a risk-free interest rate minus the dividend yield. The 
actual difference is about 6%. 

The z-statistics in the final column of Table 4.1 are for the standard test that 
returns have a zero population mean. Not surprisingly, several stock series have 
significant positive values of z, at the 596 level. The stock futures series do not 
have significant values, although theory states that the expected futures return 
is positive; it is reasonable to conclude that Type II errors are made. The other 
futures series have insignificant values of z, except for the gold series, and hence 
both r and G are not significantly different from zero. 


4.3.2 Equity Risk Premia 


The expected return from an investment in a market portfolio of stocks exceeds 
the return from riskless investments by an amount known as the equity risk pre- 
mium. A positive premium is required by finance theory, otherwise there is no 
incentive to accept undiversifiable risk. The numbers discussed above show that 
accurate estimates of the equity risk premium require a long historical record and 
the optimistic assumption that the premium is constant. Estimates of the aver- 
age premium only require the weaker assumption that the premium follows a 
stationary process. 

Dimson, Marsh, and Staunton (2002) provide information about the equity risk 
premium for many countries during the twentieth century. A US geometric pre- 
mium estimate of 6% is given by the difference between the geometric returns 
G for stocks and bills from 1900 to 2000, while the arithmetic premium esti- 
mate A from annual returns is 8%. The standard errors of these estimates are 
approximately 2%. The premia for many other countries were similar to the US 
level. 

Historic premia estimates between 6% and 8% may well overestimate future 
expected premia, particularly if the estimates assume the market always survives 
(Jorion and Goetzmann 1999). Fama and French (2002) use US dividend and 
earnings growth rates from 1951 to 2000 to obtain premium estimates between 
2.5% and 4.5%, compared with the historic estimate of 7% for the same period. 
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These numbers can be reconciled by saying that US stock returns were much 
higher than expected. In contrast, Ibbotson and Chen (2003) obtain estimates only 
1% less than the historic average by considering several measures of financial 
and economic performance. Further estimates of rational premia derived from 
theoretical pricing models are provided by Arnott and Bernstein (2002) and Bansal 
and Lundblad (2002). 

All the above equity premia estimates are only for domestic market portfolios. 
Other portfolios will have different premia that reflect their exposure to market- 
wide and other risk factors. The empirical evidence against single-factor capital 
asset pricing models (CAPMs) finds that firm size and book values relative to 
market values have either been important factors in the past or proxies for such 
factors. Cross-sectional asset pricing models are, however, outside the scope of 
this book. Interesting empirical results can be found in Fama and French (1992, 
1995), Daniel and Titman (1997), and Ferguson and Shockley (2003) for the US, 
and in Fama and French (1998) and Hawawini and Keim (2000) for markets 
around the world. 


4.3.3 Futures Risk Premia 


Finance theory states that futures returns equal spot returns minus the risk-free rate 
when futures prices are determined by spot prices and a no-arbitrage condition 
(see Section 2.5). When premia are determined by a single-factor CAPM, the 
futures risk premium is the futures 8 multiplied by the market risk premium. Black 
(19762), following Dusak (1973), developed the idea that £ is zero for agricultural 
futures and hence these futures have zero risk premia and zero expected returns. 
A similar conclusion holds for currency futures as empirical estimates of f are 
close to zero (Taylor 1992). It is a mistake to extend Black's conclusion to stock 
index futures because they have positive 6, usually close to one, and hence have 
expected returns similar to the equity risk premium. 

Keynes (1930) and others have argued that producers of agricultural commodi- 
ties are net sellers of futures, hence speculators are net buyers and can demand 
to be rewarded by a positive risk premium. Direct tests of this proposition have 
met with limited success. For example, Bodie and Rosansky (1980) estimate an 
average premium of 10% per annum (standard error 4%) for portfolios made up 
of long futures positions. However, their choice of years (1950-1976) may be 
fortuitous. Later studies have found almost no evidence for a positive premium 
in commodity futures (Kolb 1992; Bessembinder 1993). The generally small z- 
statistics in Table 4.1 are evidence against a constant, nonzero premium. 

Any risk premium can vary through time. Furthermore, speculators are net 
buyers of futures at some times and net sellers at other times in the agricul- 
tural sector (Chang 1985). The general idea of time-varying risk premia (TVRP), 
that may average zero in the long run, has been developed for currency futures 
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markets in particular. International asset-pricing models show there can be TVRP 
in theoretical models (Adler and Dumas 1983; Hodrick 1987). There is an exten- 
sive literature on this subject that has found little evidence for TVRP (Hodrick 
1987; MacDonald 1988). We return to this subject in Section 7.10 when discussing 
payoffs from futures trading rules. 


4.4 Standard Deviations 


The numbers s in Table 4.1 are unconditional standard deviations. They provide 
information about the historical standard deviation of a daily return when noth- 
ing is known about the recent past. When recent returns are available we can 
try to calculate standard deviations conditional on the recent information. These 
conditional standard deviations vary considerably through time as we will see in 
Chapters 9-11. This phenomenon, known as conditional heteroskedasticity, can 
be sufficient to cause unconditional estimates to vary considerably from year to 
year. Estimates of daily standard deviations over periods as long as a decade can 
also vary from period to period, because of nonstationarity that may be attributed 
to structural changes in the economic environment. 

The daily standard deviations in Table 4.1 are between 0.6% and 1.8%, with 
the exception of the very low value for bill futures. The column of standard 
deviations can be ranked to show that returns from currencies and bonds have the 
least variability, with equity indices more variable and the highest ranks going to 
the returns from individual stocks. Our figures and others given by Perry (1982), 
Kon (1984), Brown and Warner (1985), and Blair, Poon, and Taylor (2002) suggest 
that a large firm may have a standard deviation (s.d.) 50-100% larger than the s.d. 
for a well-diversified index. Furthermore, the average s.d. for randomly selected 
firms may be more than three times as large as the s.d. for an index. Hawawini 
and Keim (1995) show that US portfolios formed by allocating stocks to ten size 
categories have standard deviations that monotonically increase as the firm size 
decreases. The s.d. of monthly returns, from 1951 to 1989, was 4.196 for the 
portfolio of largest stocks, increasing to 6.8% for the portfolio of smallest stocks. 

The possibility of a nonstationary variance over long periods of time makes it 
difficult to produce useful long-run estimates of price volatility. Table 4.2 offers 
some approximate ranges for daily and annual returns. The annual standard devi- 
ations are given by multiplying the daily figures by the square root of the number 
of trading days in one year, here assumed to be 253. This method requires daily 
returns to be uncorrelated. 

These figures can be used to provide some insight into the range of possible 
payoffs from an annual investment. To do this, suppose daily returns are uncor- 
related, that their standard deviation o is the midpoint of some range above, and 
that their annual sum has a normal distribution with mean Nu and variance No?. 
Also, suppose the expected simple annual return is known so u can be deduced 
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Table 4.2. Typical standard deviations. 


Percentage standard deviation 


Asset Daily returns Annual returns 
Currencies 0.6-0.9% 10-14% 
Diversified stock index 0.7-1.3% 11-21% 
Stock of a large US firm 1.2-2.0% 19-32% 
Commodities 1.0-2.0% 16-32% 


from equation (4.4). Then a $1 investment will produce a payoff $1 + R after one 
year that has a lognormal distribution, as log(1 + R) ~ N(Nu, No?). For spot 
currency offering an expected annual return of 5%, the payoff will be between 
$0.83 and $1.33 with a 9596 probability. Similarly, a $1 investment in a large firm 
that returns 1296 per annum, on average, will provide a payoff between $0.70 and 
$1.79 with a 9596 probability. 


4.4.1 Futures 


Spot and futures volatility should be very similar when a theoretical no-arbitrage 
condition ties the futures price to the spot price, as described by the equations 
in Section 2.5. The two volatility figures are identical, in theory, when price 
changes are uncorrelated and the real dividend yield is constant (Barone-Adesi 
and Whaley 1987). This result holds regardless of the final settlement date of the 
futures contract. It then follows that the standard deviation of futures returns is not 
a function of the time until delivery. However, Samuelson (1976) shows that when 
the spot price follows a stationary process (so spot price changes are correlated), 
then the futures standard deviation ought to increase as the delivery date comes 
closer. Taylor (1985) found no evidence to support the idea of increasing volatility 
during the final six trading months of currency and commodity futures contracts. 
Antoniou and Holmes (1994) report the same conclusion for FTSE 100 futures. 
The no-arbitrage prediction of equal volatility is contradicted by the higher stan- 
dard deviation of stock index futures returns compared with spot measurements 
for the same index. The numbers in Table 4.1 show the futures standard deviation 
is 3896 higher than the spot figure for the decade of S&P 500 returns considered 
here, although the estimate falls to 2596 if the crash week is excluded. The com- 
parable FTSE 100 figure is an extra 15% for all the data, or 17% when the crash 
week is ignored. Board and Sutcliffe (1995) survey international evidence and dis- 
cuss many possible explanations. A strong contender is an understatement of spot 
volatility due to stale prices being included in the spot index (Stoll and Whaley 
1988). This argument is supported by the results in Ahn, Boudoukh, Richardson, 
and Whitelaw (2002) for 24 indices from 15 countries. The spot standard devia- 
tion for daily returns is less than the futures value for 23 of the 24 indices. Some 
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of their estimates are 1.0096 (spot) and 1.21% (futures) for the S&P 500, 0.96% 
and 1.1146 for the FTSE 100, and 1.21% and 1.37% for Tokyo's TOPIX index. 
Additional microstructure explanations, for example, bid-ask bounce in futures 
prices and noise traders preferring futures transactions, may also be relevant. 


4.5 Calendar Effects 


Average equity returns have varied significantly depending on the day of the 
week, the day of the month, the month of the year, and the proximity of holidays. 
They have even varied with the relative positions of the Sun, Earth, and Moon! 
These calendar and cyclical anomalies are now discussed in detail. They are, 
however, ignored in most of this book because their implications for specifying 
and estimating models for returns are usually unimportant. 

Efficient market theory states that anomalies may disappear once they are de- 
scribed by academics to the investment community because any profitable oppor- 
tunities will be traded out of existence. They will also seem to disappear if they are 
merely the result of data mining. There is indeed evidence that some anomalies 
have disappeared in recent years, such as the Monday effect (Rubinstein 2001; 
Sullivan, Timmermann, and White 2001; Schwert 2003), the turn-of-the-month 
effect (Maberly and Waggoner 2000), and the size effect (Dimson and Marsh 
1999; Schwert 2003). 

A more severe criticism of calendar anomalies is that many of them may be 
merely the result of many researchers testing many hypotheses on the same data. 
Sullivan et al. (2001) show that the significance of calendar trading rules is much 
weaker when it is assessed in the context of a universe of rules that could plausibly 
have been evaluated. They observe that none of the calendar anomalies they 
investigate were discovered following a theoretical prediction. Instead, almost all 
the theoretical explanations that now exist are ex post rationalizations. 

There are two reasons, however, for supposing that calendar effects are not 
merely curious results uncovered by data mining. First, the anomalous behavior 
has been found in almost all countries (Hawawini and Keim 1995). Second, effects 
first discovered in relatively recent US data are also found throughout much 
longer periods, for example, during the ninety years from 1897 to 1986 studied 
by Lakonishok and Smidt (1988). 


4.5.1 Day-of-the- Week 


Monday returns measure the result of an investment for 72 hours from Friday's 
close to Monday's close. Expected equity returns for Mondays should therefore 
be higher than for 24-hour returns on other days of the week. Seemingly over- 
whelming evidence for US equities, following Fields (1931) and Cross (1973), 
shows, however, that average Monday returns have been both lower and negative 
from 1897 until the 1980s. This anomaly is particularly puzzling because there 
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Table 4.3. Percentage return statistics, given by French. 


Monday Tuesday Wednesday Thursday Friday 


Mean —0.168 0.016 0.097 0.045 0.087 
s.d. 0.843 0.727 0.748 0.686 0.660 


then seems to have been no compensation for accepting equity risk during the 
weekend and/or on Monday. 

French (1980) reports the means and standard deviations (s.d.) for daily, per- 
centage returns from the Standard & Poor Composite Index between 1953 and 
1977 (see Table 4.3). The Monday returns are lower than for other days, by 0.2% 
on average. This is a small difference for one day but it is substantial over one 
year. French shows that Monday’s mean was negative for 20 of the 25 years and 
for all five subperiods of five years. Rubinstein (2001) notes that Monday’s mean 
was negative for all twelve five-year periods from 1928 to 1987 and that Monday 
always had the worst five-year average return. Lakonishok and Smidt (1988) go 
back even further, to 1897. There is some evidence that the period of negative 
mean returns has been confined to the weekend and early trading on Monday 
(Harris 1986; Abraham and Ikenberry 1994). 

Although the negative Monday average was a persistent phenomenon in the 
US for many years, the empirical evidence is markedly different for recent years. 
Monday returns have been slightly higher than for the other days of the week after 
the publication of Cross (1973), from 1973 until 1996 (Sullivan et al. 2001). Fur- 
thermore, Monday was the best day in the decade from 1989 to 1998 (Rubinstein 
2001). 

Significant day-of-the-week effects have been found in many other countries: 
for Australia, Canada, Japan, and the UK by Jaffe and Westerfield (1985); for 
Canada, Singapore, and the UK by Condoyanni et al. (1987); for Japan from 1949 
to 1988 by Ziemba (1991); and for several European countries between 1986 and 
1992 by Chang, Pinegar, and Ravichandran (1993). Negative Monday averages 
are common for European stock indices and also occur in Japan. Negative Tuesday 
averages are found for Pacific Rim countries, including Japan; this Tuesday effect 
may reflect the earlier Monday effect in New York. 

It must be appreciated that the negative mean effects are small. The Monday 
average return is between —0.2% and —0.1% in many studies. Selling on Friday 
and buying late on Monday would lose money after the historically appropriate 
transaction costs are deducted. 

Some explanations for the day-of-the-week effects observed in older datasets 
are reviewed by Hawawini and Keim (1995). They are found to be unconvincing, 
not least because the general effect is robust across countries. Nevertheless, we 
may note that Abraham and Ikenberry (1994) support the idea that US individual 
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investors have tended to sell on Mondays after reviewing their portfolios at the 
weekend following negative returns on Friday, that Chen and Singal (2003) show 
some short sellers close positions (i.e. buy) on Fridays and reopen them (i.e. sell) 
on Mondays, and that Penman (1987) finds that bad news about US earnings has 
been more likely than good news to be announced at the weekend. 

Regression tests that use dummy variables for the day-of-the-week are identi- 
cal to one-way ANOVA tests. These tests provide an F-statistic which is usually 
so large that the hypothesis of equal expected returns for all days is rejected 
at very low significance levels. Connolly (1989, 1991) has criticized these tests 
because they ignore nonnormality, conditional heteroskedasticity, and autocorre- 
lation. Furthermore, for large samples and conventional significance levels (e.g. 
5%), Type I errors can be more common than Type II errors when Bayesian meth- 
ods are used to calculate the error probabilities. Connolly reinterpreted test results 
from a Bayesian perspective and argued that the US day-of-the-week effect had 
disappeared by 1975. Abraham and Ikenberry (1994) disagreed and showed that 
negative Monday averages were a consequence of even lower averages when Fri- 
day's return is negative. Chang et al. (1993) applied Connolly's Bayesian test 
methodology and found that European effects were significant after 1985. 


4.5.2 Holidays 


US markets have often closed for eight days in a year. Ariel (1990) shows that 
more than one-third of the total return from US stock market portfolios was 
earned on the eight days before the holidays, during the period from 1963 to 
1982. The average pre-holiday return from the CRSP value-weighted index was 
0.3696 during this period, compared with 0.0396 for other days. Statistical tests 
show the difference is highly significant and not a consequence of other calendar 
anomalies. Ariel notes that Merrill (1966) had found that the Dow Jones Industrial 
Average advanced more frequently on days preceding holidays during the earlier 
period from 1897 to 1965. Ariel analyzes possible explanations for the holiday 
effect but finds they all have shortcomings. Lakonishok and Smidt (1988) confirm 
Ariel's findings back to 1897 for individual stocks. 

There is some evidence for a pre-holiday effect in Japan (Ziemba 1991) and 
the UK. Kim and Park (1994) estimate mean returns for the UK FT-30 index 
from 1972 to 1987 to be 0.22% on the day before a holiday, —0.14% for the 
holiday and the next day, and 0.0446 on other days. Their mean estimates for the 
Japanese Nikkei-Dow index during the same years are 0.19% before a holiday 
and 0.0446 on other days. The comparable estimates of the US effect, from the 
S&P 500 index, are further apart at 0.30% (pre-holiday) and 0.02% (others). All 
these estimates ignore dividends. The UK and Japanese holiday effects are not 
the same as the US effect because they are found when the US market is open 
and the other markets are closed. 
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Table 4.4. Average percentage daily returns, 
by day of the week and before and after holidays. 


Series Mon Tue Wed Thu Fri Pre-h Hol F 


S&P 500-share S —0.02 0.10* 0.13* 0.05 0.05 024* —0.01 1.60 
S&P 500-share F 0.04 0.00 0.10 0.04 —0.06 0.05 003 0.66 
Coca Cola S 0.12 0.24* 0.15% 0.10 0.03 0.08 —0.19 1.11 
General Electric S 0.16* 0.14* 0.04 0.01 0.02 0.21 —0.13 1.03 
General Motors S  020* 0.08 0.08 —0.02 —0.07 0.20 —0.07 1.25 
FT 100-share S —0.13*X 0.06 0.09% 0.04 0.09* 0.18 0.12 3.20* 
FT 100-share F —0.15*X 0.07 0.09 0.01 0.01 0.18 0.16 2.60* 
Glaxo S —0.13X 0.18* 0.27* 0.19% 0.19% 0.10 | 0.39 2.50* 
Marks & Spencer S —017*X 0.15  024* 009  0.16* 0.12 0.03 3.23* 
Shell S —0.9*X 0.12* 0.14* 0.10  0.16* 023 022 437* 
Nikkei 225-share S —0.18*X —0.01 0.10  0.16* 0.02 0.15 008 3.09* 
Treasury bonds F —0.02 0.09* —0.02 0.05 0.02 0.19* —0.03 1.89 
3-month sterling bills F 001 0.00 001 —0.00 —0.00 0.01 —0.03 0.86 
DM/$ F —0.03 0.02 0.03 0.03 —0.08 0.02 —0.10 0.88 
Sterling/$ F —0.02 0.05 0.04 0.01 —0.05 0.02 0.00 0.95 
Swiss franc/$ F —0.02 0.03 0.01 0.03 —0.05 0.03 —0.15 0.97 
Yen/$ F —0.02 0.03 0.03 0.05 —0.04 0.04 —0.08 1.23 
Gold F —0.16* 0.04 —0.07 0.00 —0.1 0.34* —0.27 2.73* 
Corn F —0.07 0.06 —0.01 0.04 —0.07 0.13 0.06 0.53 
F 


Live cattle —0.04 —0.05 —0.02 0.07  0.13* 028* 0.15 3.16* 


“Pre-h” is a day that precedes a holiday. “Hol” refers to returns for a holiday period and the subsequent 
open-market day. Starred averages are significantly different from zero, at the 5% significance level, when 
the standard hypothesis test is applied. Monday averages are followed by “X” if they differ significantly 
from the Tuesday to Friday average, at the 5% level. F is the one-way ANOVA test statistic for a null 
hypothesis of equal expected returns; stars indicate values beyond 2.10, which is the 9596 quantile of the 
relevant F distribution. 


4.5.3 Weekday Results for Twenty Series 


Table 4.4 documents day-of-the-week and holiday mean returns for our twenty 
time series. These series are shorter than those already discussed and thus it 
is less easy to identify any anomalous effects. Each return is assigned to one of 
seven categories. Monday returns are for all 72-hour periods when the market was 
open on Friday, Monday, and Tuesday. Tuesday returns are for 24-hour periods 
when the market was open from Monday to Wednesday inclusive; likewise for 
Wednesday, Thursday, and Friday returns. The remaining returns are assigned 
either to a pre-holiday category if the market is closed on the next weekday or to 
a holiday category if the market is closed on the previous weekday. 

The average percentage returns are followed by a star in Table 4.4 if the average 
for the category is significantly different from zero, using a standard two-tail test 
and the 5% significance level. There are six starred negative averages: all are for 
Mondays and are between —0.19% and —0.13%. Four of these six averages are 
for UK equity series. The six Monday equity averages marked with the letter “X?” 
are significantly different from the overall averages for Tuesdays to Fridays for 
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the same series (596 level, two-tail test). There are 25 starred positive averages 
including the Monday averages for General Electric and General Motors. Of the 
3] starred averages, 24 are for spot equity series and only 7 are for the futures 
series. None of the currency futures averages are significant. 

All the pre-holiday averages are positive and, with one exception, are more 
than the holiday averages. Also, the pre-holiday average is the highest of the 
seven averages for a majority of the assets considered. The significant estimate 
of 0.24% for the S&P 500 index, from July 1982 to June 1992, is similar to the 
average found by Ariel (1990) for the earlier period from 1963 to 1982. 

The F-statistic in the far-right column of Table 4.4 is the ANOVA test statistic 
for equal expected returns across the categories. The degrees-of-freedom of the 
test statistic are 6 and 2400+ and hence the null distribution is approximately Exe 
when the usual assumptions are made. Some of the assumptions, such as normal 
distributions, are false although this may not be very important. The eight starred 
F-statistics are for the five UK equity series, the Nikkei index, gold futures, and 
cattle futures. Rigorous tests are possible by maximizing likelihoods within an 
ARCH framework, using the methodology described in Chapter 10. 

The extreme returns in the week commencing on 19 October 1987 have a 
noticeable impact on a few of the equity averages in Table 4.4. If we exclude the 
five returns in the crash week, the Monday S&P 500 futures average increases 
from 0.04% to 0.11% and becomes significant at the 5% level; also, the Coca 
Cola Monday average moves up to 0.17% and is significant, two US averages fall 
into the insignificant region and two UK averages increase to significant positive 
values. The F-statistics for the US equity series change by as much as 0.44 when 
the crash week is removed but most of the changes in the F-statistics are small 
and the list of significant test statistics does not change. 


4.5.4 Day-of-the-Month 


Ariel (1987) reports the remarkable result that all of the US stock market's cumu- 
lative advance from 1963 to 1981 occurred in regular half-month periods. Average 
returns were only positive for the last trading day of the month and for trading 
days in the first half of the month during these years. These results still occur when 
large returns around the start of the year are excluded. Once more, Lakonishok 
and Smidt (1988) find evidence for the calendar anomaly back to 1897. They 
also show that positive average returns are particularly high for the four trading 
days that commence on the last trading day of a month. This turn-of-the-month 
anomaly has also been found in Japanese index returns (Ziemba 1991) and for 
other countries (Jaffe and Westerfield 1989), but it disappeared from the S&P 500 
index after 1990 (Maberly and Waggoner 2000). 

The empirical evidence might be consistent with buying pressure following 
the payment of monthly salaries (Ogden 1990). Also, Penman (1987) shows that 
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US companies have been more likely to publish their earnings information in the 
first two weeks of a calendar quarter if the earnings news is good. This result 
holds for each of the four quarters and average returns are higher in these “good 
news" periods. Earnings news may therefore explain some of the day-of-the- 
month anomaly (as well as low Monday returns), but only if investors irrationally 
ignore the bad signal implicit in a delayed earnings announcement. 

There is no evidence for day-of-the-month effects that are statistically signif- 
icant for our twenty series. Higher average returns were recorded for the eleven 
equity series during the period from the last day of a month until the middle of 
the next month, particularly for all five index series until the end of the first week 
of the month. 


4.5.5 Month-of-the-Year 


Rozeff and Kinney (1976) show that returns from equal-weighted US stock indices 
were significantly higher at the start of the new tax year in January than in 
other months, during the period from 1904 to 1974. Praetz (1973) and Officer 
(1975) show in earlier research that Australian return distributions depend upon 
the month. The international study of Gultekin and Gultekin (1983) documents 
monthly mean returns for seventeen countries between 1959 and 1979. Mean 
returns were found to be significantly higher in January than in other months for 
thirteen of the seventeen countries. 

Small US firms have earned higher returns on average than predicted by a single- 
factor pricing model (Banz 1981) and these excess returns have been found to 
occur primarily at the start of the year (Keim 1983; Hawawini and Keim 2000). 
Thus the January seasonal effect interacts with a size effect, although the size 
premium in observed returns at the turn of the year may be exaggerated by bid- 
ask spreads (Keim 1989). The size effect can be seen clearly in average January 
returns for US indices given by Gultekin and Gultekin (1983): 5.1% for an equal- 
weighted index compared with only 1.0% for a value-weighted index. Tinic and 
West (1984, Table 7) estimate the average return in January as 4% per month, but 
only 146 on average for the other months, for US companies that had average-risk 
(unit beta) between 1935 and 1982. 

The most popular explanation of the substantial January effect in the US market 
is the tax-loss selling hypothesis of Brown, Keim, Kleidon, and Marsh (1983). 
They suggest that selling pressure at the tax year-end depresses prices that rebound 
in January. The hypothesis is supported first by the absence of a January effect 
before 1917, when there was no incentive to sell for tax reasons (Schultz 1985; 
Jones, Lee, and Apenbrink 1991), and second by the year-end trading behavior 
of individual investors (Ritter 1988). The international evidence, however, is far 
from consistent, although other explanations are not more credible (Hawawini 
and Keim 1995). 
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The recent paper by Bouman and Jacobsen (2002) documents a new time-of- 
the-year anomaly. They show that monthly index returns are significantly lower 
during the six months from May until October than the remaining half of the year 
from November to April. They motivate this splitting of the year by reference to an 
old market saying, “Sell in May and go away,” that is frequently mentioned in the 
European financial press. Their results are typically for the period from 1970 to 
1998, which postdates the market folklore. The average index return for Nov/Apr 
is higher than for May/Sep for 36 of the 37 countries studied. Many of the differ- 
ences are statistically significant. Bouman and Jacobsen provide evidence that the 
differences are not a rediscovery of the January effect. The average differences 
in one-month returns are 0.996, 2.096, and 1.596, respectively, for the US, the 
UK, and Japan. The t-values for these differences are 1.95, 3.10, and 2.62. They 
decrease when a January dummy variable is included in their regression equation 
to 1.61, 2.48, and 2.23. 

There is no compelling evidence for month-of-the-year effects for our twenty 
series. There are indications of anomalous monthly averages but the evidence from 
a decade of returns lacks statistical significance, as noted previously for day-of- 
the-month subperiods. The January averages are relatively high for the indices; 
for our US and UK series they are higher than for all other months and the Nikkei 
January average is only surpassed by the May average. However, none of the 
January index averages is significantly different to the average for the remaining 
eleven months, at the 5% level. 


4.5.6 Astronomy and Average Returns 


The number of hours of daylight depends on the calendar and upon latitude. 
When these hours decrease during the fall (autumn) many people are less content 
and a significant proportion suffer from depression. Motivated by these medical 
facts, Kamstra, Kramer, and Levi (2003) use regression methods to show that 
daylight is associated with average returns when asymmetric effects before and 
after the winter solstice are included in the analysis. As they expect, their effects 
increase with distance from the Equator and there is a phase difference of six 
months between the Northern and Southern Hemispheres. They give results for 
nine countries, finding evidence that stock market returns are lower in the fall and 
higher in the winter season. Related evidence about the sensitivity of returns to 
sunlight is provided by Hirshleifer and Shumway (2003). 

The phases of the moon repeat every 29.5 days and appear to be related to stock 
market returns. Dichev and Janes (2003) show that the average returns from all 
major US stock indices around new moon dates are nearly double those around 
full moon dates, when seven-day windows define the periods around the lunar 
dates. The annualized differences between new and full moon returns are then 
between 5% and 8%. These large differences are not, however, significant—the 
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probability of observing a larger difference when no lunar effect exists is 3496 
for their longest dataset, which contains daily Dow Jones returns from 1896 to 
1999. Higher averages around new moon dates are also found for all of the other 
six G7 countries and for all but one of eighteen further countries. The differences 
between new and full moon returns average more than 9% for all 25 countries 
and the difference is statistically significant at the 1% level, for the period from 
1973 to 2000. Yuan, Zheng, and Zhu (2001) provide further results. 


4.5.7 Autocorrelation Induced by Calendar Effects 


Calendar anomalies are puzzling and may have become less pronounced in recent 
years. On their own they have little impact on the autocorrelations of daily returns 
because daily means are very small compared with daily standard deviations. 
Appendix 4.13 provides some theoretical results when returns are the sum of 
a white noise process and a mean process determined solely by the calendar. 
The maximum autocorrelation induced by day-of-the-week effects is 0.02, at 
lag 5, when the daily means are the numbers reported by French (1980) and we 
pretend there are no holidays, so that Monday returns are always separated by 
five days. The maximum autocorrelation induced by either the day-of-the-month 
or the month-of-the-year effects is less than 0.003 for reasonable estimates of the 
magnitudes of these effects. 


4.5.8 Standard Deviations and the Calendar 


Although it is difficult to explain why expected returns depend on the calendar, it 
is easy to produce one plausible explanation of calendar variations in the standard 
deviations of returns. Standard deviations are measures of price variability and 
prices change more frequently when there is more news. In particular, Monday 
returns reflect news on three calendar days and may be expected to be more vari- 
able than other returns. Monday returns will have a variance three times as large 
as the variance on other weekdays if prices follow geometric Brownian motion 
in calendar time. Likewise, holiday returns may be expected to have additional 
variability. 

The Monday effect is seen in the standard deviations given by French (1980). 
The standard deviation of Monday returns is 0.84%, compared with 0.71% for 
other days. The additional Monday variation is about 1896 (— 0.84/0.71 — 1), 
which is much less than the 7396 (— 4/3 — 1) expected when news arrives at the 
same rate on all days including Saturdays and Sundays. Not surprisingly, observed 
standard deviations of returns imply, first, that less relevant information is pro- 
duced during the weekend than on weekdays and, second, that less is produced 
during hours when stock markets are closed (French and Roll 1986). 

To compare standard deviations across days of the week, without modeling 
other volatility effects associated with conditional heteroskedasticity, we can use 
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Table 4.5. Estimated percentage proportions of weekly variance, by day of the week. 


Series Mon Tue Wed Thu Fri 


S&P 500-share S 213 212 19.33 193 19.0 
S&P 500-share F 202 220 196 195 187 
Coca Cola S 201 205 190 197 207 
General Electric S 21.0 207 18.1 189 212 
General Motors S 196 202 1931 207 204 
FT 100-share S 234* 19.0 208 182 18.6 
FT 100-share F 23.8* 188 194 184 19.7 
Glaxo S 204 208 197 20.1 19.1 
Marks & Spencer S 202 22.3* 203 184 18.7 
Shell S 200 204 19.1 216 19.0 
Nikkei 225-share S 25.8* 186 200 183 17.2* 
Treasury bonds F 2043 16.8* 171* 188 271* 
3-month sterling bills F  25.* 18.5 17.3* 182 20.8 
DM/$ F 23.5* 191 17.2* 189 213 
Sterling/$ F 247* 204  163* 173* 213 
Swiss franc/$ F 229* 19.5 16.9* 191 21.7 
Yen/$ F 220 195 17.0* 199 21.6 
Gold F 251* 17.4* 17.656 17.9* 22.0 
Corn F 263* 206 18.6 185  16.0* 
F 


Live cattle 21.8 19.9 20.0 19.5 187 


Starred averages are significantly different from 20%, at the 5% significance level. The average 
standard error for the above estimated proportions is approximately 1%. 


robust estimates of the proportion of weekly variance attributable to each day. 
Suppose a market is open for six consecutive days commencing with a particular 
Friday denoted by day t. Let 7;+; be the return for day t + i, as usual, and let 


5 

"RV, "EV. 

Wu = (Fri — ri) LES -rj) 
ja 


with 7; the mean return for day i (Monday is i = 1, etc.). The quantities w;.,; can 
be averaged across weeks to give estimates of the proportion of the total variance 
in a week associated with each day. The estimates will be consistent but may be 
slightly biased. 

Table 4.5 lists percentage estimates for the twenty series. The standard errors 
of these estimates are all approximately 196. The starred estimates in this table are 
significantly different from 20%, using a 5% significance level and a two-tail test. 
The five possible tests for any row are not independent because the proportions 
must sum to one. There are no starred estimates for the US equity series and thus 
these estimates provide no evidence for a weekly volatility seasonal pattern. The 
Monday returns for the UK and Japanese index series have significantly more 
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variability than those for other days. The high figure for the Nikkei index may 
reflect trading on Saturdays for some of the period considered. 

The Treasury bond futures estimates indicate significantly more variability in 
Friday returns and this reflects the timing of US macroeconomic news announce- 
ments, many of which are made early on Friday mornings (Harvey and Huang 
1991; Ederington and Lee 1993, 1995). All the currency futures series, as well 
as the gold and corn futures series, have significantly more variability in Mon- 
day returns. Baillie and Bollerslev (1989b) report the same conclusion for spot 
currency rates. Futures variability is low on Wednesdays for many series. Typical 
Monday and Wednesday proportions are 24% and 17% and this suggests Mon- 
day standard deviations are approximately one-fifth higher than the Wednesday 
numbers. 

Holiday returns generally have higher standard deviations than other returns 
and the opposite result generally holds for pre-holiday returns. When standard 
deviations are ranked for the seven categories used in Table 4.4, with the crash 
week excluded, pre-holiday returns have the lowest rank for thirteen series and 
holiday returns have the highest rank for a different set of thirteen series; holiday 
returns outrank pre-holiday returns for nineteen of the twenty series. The holiday 
volatility effect is particularly marked for the S&P 500 spot series; the pre-holiday 
returns have standard deviation 0.57%, the Monday to Friday statistics range from 
0.7896 to 0.97%, and the holiday returns statistic is 1.09%. 

Identifying further calendar patterns in standard deviations may not be possible 
for the twenty series. Estimates for subperiods of the month do not show any dis- 
cernible pattern. Standard deviation estimates have been noticeably higher in some 
months than others (e.g. October for stocks) but this is probably a consequence 
of general changes in volatility that can produce higher levels of conditional vari- 
ances for several weeks. 


4.6 Skewness and Kurtosis 


Skewness statistics are sometimes used to assess the symmetry of distributions, 
while kurtosis statistics are often interpreted as a measure of similarity to a normal 
distribution. 

These statistics are sensitive to extreme observations because they make use of 
the third and fourth powers of the observations, respectively. This is a particular 
problem for stock series that include the 1987 crash. Consequently, the crash 
week is excluded from the stock series for the calculations of the skewness and 
kurtosis statistics listed in Table 4.1. The S&P 500 spot series has skewness equal 
to —3.58 for the entire ten years of daily returns but this figure falls to —0.67 
when the crash week is excluded. The kurtosis for the same series falls from 77.0 
to 10.4 when the crash week is removed. There are similar large changes for the 
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S&P 500 futures series and for both FTSE 100 series, with important but less 
dramatic changes for the individual stocks and the Nikkei index. 

The three-month sterling bill futures have exceptional skewness and kurtosis 
values. Once more, these can be attributed to an extreme outlier. On Thursday, 17 
September 1992, the day after sterling left the European Exchange Rate Mecha- 
nism, sterling interest rates fell sharply and the futures return was 2.996, some 18 
standard deviations away from the average. When the week containing this crisis 
is excluded, the skewness falls from 2.29 to —0.15 and the kurtosis from 59.8 to 
20.8. There are further extreme outliers in January 1985. 

The standard error of a skewness estimate b calculated from n returns depends 
on n and the population distribution. It equals 4/6/71 for a random sample from a 
normal distribution. This formula is of minimal value because returns have excess 
kurtosis that increases the standard error. Few of the estimates b are far from zero 
and hence they do not provide much evidence that the returns distributions are 
not symmetric. The S&P 500 index figures, however, are both less than —0.5. 
There is some evidence for negative skewness in US index returns (Campbell and 
Hentschel 1992) and for slight positive skewness in company returns (Perry 1982) 
that is higher, on average, for small firms and nonsurviving firms (Duffee 1995). 
However, almost all the evidence for unconditional skewness in US stock returns 
may be a consequence of very occasional negative outliers; for some evidence, 
see Harvey and Siddique (1999) for the S&P 500 index and Blair et al. (2002) for 
the constituents of the S&P 100 index. 

All twenty sets of returns are leptokurtic, since all the estimates of kurtosis in 
Table 4.1 exceed 3, which is the value for normal distributions. The standard error 
of a kurtosis estimate k is ./24/n for a random sample from a normal distribution. 
This equals approximately 0.1 for our series. Nineteen of the twenty estimates in 
Table 4.1 exceed 3 by more than ten of these standard errors. It is very clear that 
the returns-generating process is not even approximately Gaussian. This is an old 
conclusion that may first have been established in Alexander (1961). It has since 
been shown for almost all series of daily and more frequent returns. 


4.7 'TheShape of the Returns Distribution 


The first important stylized fact for daily returns is a remark about their distribu- 
tion: 

1. The distribution of returns is not normal. 
Instead, we can say of the distribution that 

* itis approximately symmetric; 

* jt has fat tails; 


* it has a high peak. 
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Table 4.6. Relative frequencies for samples of returns. 


Percent Percentage of returns within/beyond the number of 
no standard deviations from the mean 
change ` within beyond 
oe m ——— 


0.25 0.5 1 1.5 2 3 4 5 6 


Normal distribution 19.74 38.29 31.73 13.36 4.55 0.27 0.01 0.00 0.00 
Series 
S&P 500-share S 0.00 30.88 52.67 19.73 8.50 3.28 0.91 0.47 0.24 0.20 
S&P 500-share F 1.38 34.12 58.20 16.29 6.96 2.85 0.71 0.40 0.24 0.20 
Coca Cola S 6.64 24.87 4824 22.74 8.82 3.91 0.95 0.47 0.20 0.12 
General Electric S 637 25.07 47.61 23.53 10.40 4.98 1.15 0.40 0.20 0.08 
General Motors S 6.56 24.75 45.83 24.87 10.48 4.47 0.83 0.16 0.08 0.08 
FT 100-share S 0.28 22.46 43.46 2420 7.63 2.97 0.83 0.47 0.24 0.20 
FT 100-share F 2.53 23.37 44.33 23.49 8.26 3.32 0.99 0.47 0.28 0.16 
Glaxo S 7.12 26.70 50.44 22.82 9.69 4.75 1.66 0.44 0.24 0.12 
Marks & Spencer S 14.60 21.36 44.66 26.78 11.87 5.06 1.15 0.28 0.04 0.04 
Shell S 9.81 23.38 46.00 26.07 10.68 4.35 1.15 0.51 0.28 0.08 
Nikkei 225-share S 0.8 30.60 56.01 19.85 10.02 5.24 1.54 0.45 0.24 0.12 
Treasury bonds F 2.33 24.96 46.16 26.74 11.83 6.01 0.99 0.20 0.04 0.00 
3-month sterling bills F 6.02 37.08 60.07 16.78 7.72 4.04 1.39 0.75 0.36 0.20 
DM/$ F 2.45 25.35 44.80 26.65 11.98 5.54 1.19 0.24 0.04 0.04 
Sterling/$ F 2.61 26.53 46.94 26.49 11.23 5.42 1.27 0.36 0.08 0.00 
Swiss franc/$ F 1.30 23.61 43.69 27.80 12.50 5.10 0.91 0.20 0.04 0.04 
Yen/$ F 2.02 28.00 48.56 24.71 11.94 5.34 1.30 0.24 0.16 0.08 
Gold F 1.67 31.40 54.28 22.40 11.50 6.42 1.94 0.52 0.08 0.00 
Corn F 4.03 24.84 47.07 23.69 10.40 5.58 1.70 0.40 0.04 0.04 
Live cattle F 2.21 22.78 44.44 28.39 13.72 6.72 0.00 0.00 0.00 0.00 
Averages 
Spot series 25.56 48.32 23.40 9.79 433 1.13 0.41 0.19 0.11 
Futures series 27.46 48.96 23.95 10.73 5.12 1.13 0.34 0.12 0.07 
All series 26.61 48.67 23.70 10.31 4.77 1.13 0.37 0.15 0.09 
Series with crash excluded 
S&P 500-share S 0.00 27.93 48.93 24.01 11.05 4.64 1.07 0.44 0.20 0.12 
S&P 500-share F 1.39 28.25 50.67 23.02 11.09 5.11 1.47 0.44 0.24 0.12 
Coca Cola S 6.66 22.54 44.93 26.31 11.13 4.99 1.11 0.44 0.12 0.00 
General Electric S 6.38 23.81 45.40 25.67 11.21 5.67 1.19 0.28 0.08 0.04 
General Motors S 6.58 23.97 44.06 26.66 11.49 5.15 1.11 0.24 0.00 0.00 
FT 100-share S 0.8 20.36 39.98 28.20 9.98 3.92 0.95 0.40 0.12 0.08 
FT 100-share F 2.54 21.59 41.40 27.73 10.74 4.32 1.03 0.40 0.16 0.04 
Glaxo S 7.13 2545 48.71 24.10 10.19 5.27 1.63 0.44 0.16 0.12 
Marks & Spencer S 14.63 20.93 44.15 27.47 12.56 5.19 1.03 0.16 0.00 0.00 
Shell S 9.83 22.79 44.99 27.07 11.93 4.72 0.99 0.36 0.12 0.00 
Nikkei 225-share S 0.28 29.40 54.05 21.07 10.70 5.94 1.67 0.45 0.16 0.04 
Averages, all stock series 24.28 46.12 25.58 11.10 4.99 1.20 0.36 0.12 0.05 


The percentages of standardized daily returns, (r; — r)/s, within various inter- 
vals are shown in Table 4.6 for the twenty illustrative series. This information is 
summarized in Table 4.7 by frequencies for the magnitudes of the standardized 
daily returns, that show averages across all series. 
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Table 4.7. Average frequencies for standardized daily returns. 


Observed 
minus 
Range Observed Normal normal 


0 to 0.25 26.6% 19.7% 6.9% 
0.25 to 0.5 22.1% 18.6% 3.5% 


0.5 to 1 27.6% 30.0% | —2.496 
1 to 1.5 13.4% 18.4%  —5.0% 
1.5 to 2 5.5% 8.8% | —3.396 
2 to 3 3.6% 4.3%  —0.6% 
3+ 1.1% 0.3% 0.8% 


— Observed 
------ Normal 


4 3 2 1 (0) 1 2 3 4 
Standardized return 


Figure 4.1. S&P 500 returns distribution. 


The first two rows of Table 4.7 show there are more observations in the range 
from r —0.5s to r 4-0.5s than are expected from a normal distribution, correspond- 
ing to a high peak in empirical distributions. The final row shows there are also 
more extreme observations, either below r — 3s or above r + 35, corresponding to 
two fat tails. The high values of kurtosis are caused by the outliers in the tails. As 
the frequencies total 100%, there must be fewer observations elsewhere that occur 
within the ranges r + 0.5s to r + 3s. Note that the high peak and fat tails effects 
are interdependent, because extreme returns contribute large squared returns to 
the variance of the distribution, which implies there must be more observations 
near the center of the distribution than are found for a normal distribution having 
the same mean and variance. 

Figures 4.1 and 4.2 compare kernel estimates of the probability distribution 
for standardized returns, zt = (r; — r)/s, with the normal distribution for the 
spot S&P 500 returns and the DM/$ futures returns, respectively. These density 
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——— Observed 
------ Normal 


4 3 2 1 (0) 1 2 3 4 
Standardized return 


Figure 4.2. DM returns distribution. 
estimates f (z) have been calculated as 


X ISA i Z— Zt 
jo - 23 se( = ) (4.6) 


íel 


with $ (-) the density of the standard normal distribution and the bandwidth B a 
decreasing function of the sample size n (Silverman 1986). As the standardized 
returns have unit variance, it is acceptable to use 


B-n 9, (4.7) 


The figures show clearly the high peaks. It is, however, difficult to discern the 
fat tails because there is not much probability in the tails, although there are 
more observations than expected from a normal distribution. The crash week is 
excluded from the S&P 500 calculations; the distribution is even more peaked if 
this week is included. More details of the crash effect are provided in Table 4.6, 
where the distributions are summarized for all returns and again for the stock 
returns when the crash week is excluded. 

A general observation about the extreme returns is that about 1% of all the 
daily returns in the twenty series are more than three standard deviations from the 
mean. This is about four times the normal figure. We could conjecture that these 
*3 s.d." events occur (very approximately) three times a year. The frequency of 
more extreme outliers is also documented in Table 4.6. The average frequency of 
observations more than four standard deviations distant from the mean is close to 
0.496 and thus this event occurred, on average, once a year for each asset. Such 
events would only occur once in sixty years if daily returns were observations from 
a normal distribution. A typical standard deviation for a daily stock index return 
has been 1% and so we might expect an index to rise or fall by more than 4%, 
from one market close to the next, very approximately once a year. Statements 
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like this must be interpreted as approximate results that depend on the returns 
process having a similar distribution in the future to that estimated from previous 
data. 

Inferences from extreme values may be relevant for risk management, but 
reliable inferences require sufficient extremes and the assumption of stationarity. 
Extreme value theory applied to financial returns is not covered in any detail in this 
book. Some results are noted in Section 12.12 and some interesting references are 
Loretan and Phillips (1994), Longin (1996), Tsay (2002, Chapter 7), and Poon, 
Rockinger, and Tawn (2004). 

Many returns are zero and this helps to explain the peaked centers of the empir- 
ical distributions. The percentage of days having a zero price change from the 
previous day is given in Table 4.6 under the heading “Percent no change.” These 
zero changes are not simply a consequence of infrequent trading. Each of the 
three US stock return series has more than 696 of the returns equal to zero, but the 
stocks all have trading volumes measured in tens of thousands of shares every day. 
The US stock zeros are a consequence of the minimum price movement being rel- 
atively large compared with the standard deviation of returns. The average price 
for these series is about $60 and the minimum price change was one-eighth of 
a dollar; thus the smallest possible positive return was typically 0.2%, which is 
more than 0.1 s.d. The chance of a normal variable being within 0.05 s.d. of the 
mean value is 4%, so the number of US stock zeros can be explained by discrete 
prices combined with a peaked distribution. 


4.8 Probability Distributions for Returns 


A satisfactory probability distribution for daily returns must have high kurtosis and 
be either exactly or approximately symmetric. We now review several distributions 
that have these properties. 

Praetz (1972), Clark (1973), and many others have argued that observed returns 
come from a mixture of normal distributions. There is then some mixing variable, 
€, that defines a set of conditional normal distributions for returns: 


ri | or ~ NO, f (o) (4.8) 


for some function f. The quantity p = f (wi) is a conditional variance and, to 
simplify our discussion, the mean return is supposed to be the constant u. Few 
assumptions are then required to guarantee that the unconditional distribution of 
returns has excess kurtosis, as is shown later in Chapters 8, 9, and 11. 

The mixing variable c; has been associated with observable quantities such 
as trading volume (Clark 1973; Ghysels, Gourieroux, and Jasiak 1998) and the 
number of transactions (Harris 1987; Jones, Kaul, and Lipson 1994; Ané and 
Geman 2000). Another interpretation is that w, is the number of new items of 
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relevant information absorbed by the market on day t (Beckers 1981; Tauchen and 
Pitts 1983; Gallant, Hsieh, and Tauchen 1991), although then there is no practical 
possibility of observing outcomes for wr. This issue is unimportant for appropriate 
moment tests of the mixture hypothesis (Richardson and Smith 19942). Mixture 
models require further structure to ensure that they define a satisfactory model 
for returns, provided in Chapters 9-11. In particular, the existence of conditional 
heteroskedasticity shows that variables such as œ and @;+ are not independent. 

Whatever interpretation is given to œ, choices can be made for the distribution 
of the conditional variance o? = f(q@,) that defines particular unconditional 
distributions for returns. Praetz (1972) favored an inverse gamma distribution 
for o? so that returns have a generalized Student t-distribution with degrees of 
freedom v > 2. Returns then have a finite central moment of order N if and 
only if N < v. In particular, the unconditional kurtosis is finite when v > 4 and 
then equals 3 + 6/(v — 4). As v — oo, the generalized t-distribution converges 
to a normal distribution. Blattberg and Gonedes (1974) use likelihood ratios and 
other methods to claim that Student distributions provide a better fit than stable 
distributions to US stock returns. The majority of their estimates of v are between 
4 and 6. Kon (1984) has all 33 estimates of v between 3 and 6. Bollerslev (1987) 
extends the mixture model to produce conditional ¢-distributions in an ARCH 
framework. 

Clark (1973) supposed that o? has a lognormal distribution and this choice has 
become very popular in the stochastic volatility literature. Returns are then said 
to have a lognormal-normal distribution. All the moments of the unconditional 
distribution for returns are then finite but the density function must be represented 
using an integral. The moments, however, can be found without difficulty, as is 
shown in Section 11.5. In particular, the unconditional kurtosis is 3 exp(V) when 
log(a7) has variance V. Typical estimates of V are at least 0.5 (Taylor 1986; 
Jacquier, Polson, and Rossi 1994). 

Further suggestions for the distribution of o2 are a linear function of a Poisson 
variable (Press 1967), an unconstrained, discrete distribution (Kon 1984; Ryden, 
Terásvirta, and Asbrink 1998), and a gamma distribution (Madan and Seneta 
1990). Kim and Kon (1994) make some comparisons. Others distributions used 
in finance research are also mixtures of normal distributions. For example, the 
generalized error distribution used by Nelson (1991) is a complicated mixture 
(Hsu 1980, 1982). 

The gamma and the inverse gamma distributions for o? mentioned above, 
like the inverse Gaussian (IG) distribution, are special cases of the generalized 
inverse Gaussian (GIG) distribution. The distribution of returns is called the nor- 
mal inverse Gaussian (NIG) distribution when o2 ~ [Gand the generalized hyper- 
bolic distribution when o? ~ GIG. Barndorff-Nielsen and Shephard (2001) show 
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Figure 4.3. Probability density functions. 
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Figure 4.4. Tail densities. 


that these distributions are useful for modeling returns measured over a variety 
of timescales. 

Figures 4.3 and 4.4 compare the probability density functions of normal, Stu- 
dent t, lognormal-normal (LNN), and generalized error (GED) distributions. The 
parameters of these distributions are chosen so that they have mean 0, variance 1, 
and kurtosis equal to either 3 (normal) or 6 (t, LNN, GED). Definitions and mathe- 
matical results for the three fat-tailed density functions are provided in Sections 9.6 
(t, GED) and 11.5 (LNN). The plotted density functions are all symmetric about 
0. Figure 4.3 shows that all the nonnormal densities are more peaked than the nor- 
mal and this effect is most pronounced for the GED. The normal density crosses 
the other density functions between 0.45 and 0.75 and is the highest between 0.8 
and 2.3. Figure 4.4 shows densities for the tails of the distributions. The normal 
density crosses the others again between 2.3 and 2.6 and soon becomes small 
relative to the other densities. The GED density is the highest between 2.4 and 
6.0, after which the t-density has the highest values. 

Finally, we mention the infinite-variance stable (Pareto—Lévy) distribution ad- 
vocated by Mandelbrot (1963) and Fama (1965), although few researchers now 
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use it. This distribution is covered in detail by Rachev and Mittnik (2000). There 
is no compact formula for the general density function but it can be derived 
by numerical methods. The most important parameter of the stable distribution 
is its characteristic exponent a with 0 < œ < 2. Any k-period return defined 
by the sum r; + --- + ri44 1 has the same characteristic exponent when the r; 
are independent and identically distributed. The exponent equals 2 for normal 
distributions, but defines a distribution whose variance is infinite when it is less 
than 2. Fama and Roll (1971) describe a quantile method for estimating o that 
has often been applied, while Akgiray and Lamoureux (1989) and Rachev and 
Mittnik (2000) survey and compare estimation methods. Studies of US stock 
returns, however, reject the stable distribution as a satisfactory model (Blattberg 
and Gonedes 1974; Hagerman 1978; Perry 1983). Hagerman shows that estimates 
of o steadily increase from about 1.5 for daily returns to about 1.9 for returns 
measured over 35 days. Monthly returns have distributions much closer to the 
normal shape than those of daily returns and this contradicts the stable hypothesis. 


4.9 Autocorrelations of Returns 


So far we have only reviewed properties of the distribution of one return. Now 
we consider measures of the dependence between the returns for time periods t 
and t + t, which are separated by t time periods. 

The correlation between returns t periods apart is estimated from n observations 
by the sample autocorrelation at lag v, 


ber = Y (n — Deus — F) j Xe- F, t>0, (4.9) 
t=1 t=1 


with r the sample mean of all n observations. The symbol ô indicates that the 
sample statistic estimates a correlation parameter p of a stochastic process when 
the data come from a stationary process. The two subscripts t and r respectively 
state the lag and the series that provide the estimates. The definition in (4.9) is stan- 
dard in time series literature. It provides very similar estimates to the correlation 
between the n — c pairs of observations (re, rr+r) for long time series. 

The definition is not altered here for series of futures returns taken from several 
contracts. Returns from a pair of futures contracts during the same period are 
similar because arbitrage principles imply they have a common dependence upon 
spot returns. Consequently, the distribution of the product of two mean-adjusted 
returns changes little when the two terms are for different futures contracts. 

Autocorrelation estimates are calculated with the implicit assumption that ex- 
pected returns, E [r;], are constant. Changes in expected returns would have to be 
substantial to make f; , a seriously biased estimate of the correlation between r; 
and r;47. This is shown in Appendix 4.13 for day-of-the-week and other possible 
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Table 4.8. Autocorrelations for returns. 


Lags 1-5 Lags 1-30 
Autocorrelations Category frequency 

Series 1 2 3 4 5 4 ^ 3 4 5 6 
S&P 500-share S 0.101 —0.033 —0.026 —0.020 0.056 0 0 17 11 1 1 
S&P 500-share F —0.029 —0.117 —0.023 —0.029 0.063 1 1 17 10 1 O 
Coca Cola S —0.035 —0.100 —0.038 —0.048 0.060 1 0 16 12 1 O 
General Electric S —0.023 —0.061 —0.035 —0.012 0.042 0 2 16 12 0 0 
General Motors S —0.003 —0.069 —0.009 —0.044 0.027 0 1] 14 15 0 0 
FT 100-share S 0.066 —0.001 0.020 0.073 0.010 O 1] 12 14 3 0 
FT 100-share F 0.12 —0.030 0.023 0.046 0.004 0 0 9 210 0 
Glaxo S 0.080 —0.018 —0.013 —0.023 0.007 0 0 15 14 1] 0 
Marks & Spencer S 0.034 —0.006 —0.044 0.003 —0.016 0 0 17 13 0 0 
Shell S 0.045 0.052 —0.001 —0.005 0.013 0 1] 14 14 1] 0 
Nikkei 225-share S 0.035 —0.099 —0.003 0.062 —0.027 0 1 14 14 1 O0 
Treasury bonds F 0.030 0.018 —0.011 —0.016 —0.036 0 1 12 17 0 O0 
3-month sterling bills F 0.056 —0.083 0.018 —0.031 0.045 O 1 12 14 3 0 
DM/$ F —0.001 0.001 0.015 —0.032 0.022 0 1 11 17 1 O0 
Sterling/$ F 0.027 —0.010 —0.019 —0.016 0.004 0 0 16 14 0 O 
Swiss franc/$ F —0.011 —0.004 0.006 —0.028 0.014 0 1 12 16 1 O 
Yen/$ F —0.002 0.013 0.018 —0.004 0.035 0 0 10 20 0 0 
Gold F —0.055 0.061 0.019 —0.067 0.020 0 3 12 14 1 O 
Corn F 0.101 —0.032 0.013 —0.018 —0.023 0 0 14 13 2 | 
Live cattle F 0.011 —0.009 0.043 0.022 —0.007 0 0 16 13 1 0 

Autocorrelation averages and frequency totals 
Spot series S 0.033 —0.037 —0.017 —0.001 0.019 1 6 135 119 8 1 
Futures series F 0.013 —0.017 0.009 —0.016 0.011 1 8 141 169 10 1 
All series 0.022 —0.026 —0.002 —0.009 0.015 2 14 276 288 18 2 
All series, crash excluded 0.024 —0.015 —0.008 —0.013 —0.000 0 11 293 282 14 O0 


The six categories that summarize the signs and magnitudes of the autocorrelations are (1) below —0.1, 
(2) between —0.1 and —0.05, (3) between —0.05 and 0, (4) between 0 and 0.05, (5) between 0.05 and 
0.1, and (6) above 0.1. The final row of averages presents average values after all returns in October 1987 
have been excluded. 
calendar determinants of expected returns. The consequences of changes in risk- 
free rates and risk premia for autocorrelation estimates are also minor and are 
postponed until Section 6.10. 

The sample autocorrelations of returns are generally close to zero, regardless 
of the time lag. This is the second important stylized fact for daily returns: 


2. There is almost no correlation between returns for different days. 


Statements like this can be found as far back as research by Working (1934) and 
Kendall (1953), before computers could be used for the statistical calculations. 


4.9.1 Loge 1-30 


Table 4.8 presents information about the autocorrelation estimates for the twenty 
series of daily returns, calculated for all lags t between 1 and 30 trading days. 
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To summarize their signs and magnitudes, they are here assigned to one of six 
categories: 


G) p < —0.1; 

Gi) —0.1 < ô < —0.05; 
Gii) —0.05 < ô < 0; 
(iv) 0 < ô < 0.05; 

(v) 0.05 < ô < 0.1; 
(vi) 0.1 < ô. 


The numbers of autocorrelations in each of the six categories are tabulated, along 
with the values for the first five lags. 

The estimated autocorrelations of returns are seen to be very small numbers. 
More than 90% of the 600 estimates summarized in Table 4.8 are between —0.05 
and 0.05. Some 99% of the estimates are between —0.1 and 0.1. Any linear 
dependence in the stochastic process generating daily returns must be considered 
small. There are similar numbers of positive and negative estimates; 47% are 
positive for the spot series and the comparable figure is 5596 for the futures 
series. 

Figure 4.5 is a typical scatter diagram for returns in consecutive periods. The 
gold futures returns in periods t and t + 1 respectively define the variables on the 
horizontal and vertical axes. There appears to be no linear dependence between the 
variables and it is difficult to see any evidence for a form of nonlinear dependence. 
Patterns will be visible in a scatter diagram, however, when the price changes are 
usually a small multiple of the asset's effective tick size (Crack and Ledoit 1996); 
such patterns are created by discrete prices and are not evidence of dependence 
between returns. 

The extreme observations at the time of the October 1987 crash have a notice- 
able impact on only a handful of the autocorrelation estimates. Any crash effect 
has been assessed by comparing the autocorrelations for complete series with 
the weighted averages of autocorrelations calculated first for the months until 
September 1987 and second for the months from November 1987; the weights 
are proportional to the numbers of returns in the two subsamples. The first-lag, 
US spot index autocorrelation changes from 0.101 to 0.099 when the crash month 
is excluded, the UK estimate changes from 0.066 to 0.071, and the Japanese esti- 
mate from 0.035 to 0.076. The estimate furthest from zero in Table 4.8 is —0.117, 
for the second-lag autocorrelation of S&P 500 futures returns. This exceptional 
estimate is determined by crash returns, since it changes to 0.008 when the crash 
month is removed from the series. 
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Figure 4.5. Gold returns in consecutive periods. 
4.9.2 Lag 1: Daily Equity Series 


Most of the first-lag equity autocorrelations in Table 4.8 confirm three results 
noted by several researchers for daily returns: large firms generally have positive 
estimates, portfolios and hence spot indices have higher estimates than individual 
securities, and futures on indices have less dependence than spot indices. 

The average first-lag estimate across very large samples of US stocks is almost 
zero in the paper by French and Roll (1986). In their Table 3 they document an 
average of 0.003 for the twenty years from 1963 to 1983, while Lo and MacKinlay 
(1990a, p. 181) mention an average of —0.014 for the longer period from 1962 to 
1987. However, French and Roll make it clear that the magnitude of the first-lag 
estimates increases, on average, with the size of the firm. Their average estimates 
for five quintiles defined by firm size are —0.064 (smallest firms), —0.017, 0.012, 
0.025, and 0.054 (largest firms). This pattern of estimates is consistent with neg- 
ative dependence caused by bid—ask spreads (which are wider for smaller firms) 
and positive dependence in a factor that is common to all stocks. Blair et al. (2002) 
report a median first-lag value of 0.037 for a set of large firms, defined by all firms 
included in the S&P 100 index between 1983 and 1992; the interquartile range of 
the first-lag autocorrelations is from 0.008 to 0.075 and these summary statistics 
are not sensitive to inclusion of the crash day in the calculations. 
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Fisher (1966) and Scholes and Williams (1977) observed that returns from 
portfolios of stocks will display positive dependence when the component stock 
prices depend on a common factor and are not contemporaneous. Returns from 
infrequently traded stocks will tend to reflect common information later than 
other stocks, so common information may be incorporated into portfolio returns 
on different days and hence induce positive autocorrelation. This issue of non- 
synchronous trading is analyzed in a theoretical setting by Lo and MacKinlay 
(1990b) with further analysis in Boudoukh, Richardson, and Whitelaw (1994). 
Small-firm portfolios suffer most from nonsynchronous trading and consequently 
display the most dependence. Conrad and Kaul (1989) report estimates for five 
size-based portfolios, each containing 400+ stocks. Their first-lag estimates for 
1962 to 1985 are 0.46 (smallest quintile), 0.40, 0.36, 0.35, and 0.20 (largest quin- 
tile). As equal-weighted portfolios over-represent thinly traded firms they too can 
have substantial first-lag dependence. For example, Campbell, Lo, and MacKin- 
lay (1997) report an estimate of 0.35 for the CRSP equal-weighted index from 
1962 to 1994, that can be compared with 0.18 for the comparable value-weighted 
index. 

Ahn et al. (2002) provide a detailed comparison of the first-lag autocorrelation 
for returns in recent years from 24 indices that have futures contracts written on 
their levels. The spot autocorrelation is higher than the corresponding futures value 
for each index, which is compatible with some stale prices in each index—futures 
prices then lead the spot index because the former is a traded asset unlike the latter. 
The differences between the spot and futures autocorrelations are significant at the 
5% level for 21 of the 24 indices. The S&P 500 has autocorrelations 0.03 (spot) 
and —0.04 (futures), for the FTSE 100 they are 0.08 and 0.02 and for the Nikkei 
225 the values are —0.00 and —0.02. The highest spot autocorrelations are for 
indices that have a large total weight on firms which trade relatively infrequently, 
for example, 0.22 for the Russell 2000 index of small US firms. 

First-lag equity estimates for US weekly returns have many similarities with 
estimates for daily returns. They are discussed by Conrad and Kaul (1989), Lo and 
MacKinlay (19902), Boudoukh, Richardson, and Whitelaw (1994), and Campbell 
et al. (1997). 


4.9.3 Tests 


The autocorrelation estimates can be used to test the hypothesis that the process 
generating observed returns is a series of independent and identically distributed 
(1.1.d.) random variables. The asymptotic theory in Section 3.9 informs us that 
the standard error of an autocorrelation estimate is approximately 1/,/n = 0.02 
when there are n = 2500 observations from an i.i.d. process. Twenty-eight of the 
hundred estimates given in Table 4.8 for the first five lags of the twenty series are 
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Table 4.9.  Q-statistics calculated from thirty 
autocorrelations, for returns and transformed returns. 


2 


Series r |r| r log(|r — F|) 
S&P 500-share S 662 975.5 214.3 151.5 
S&P 500-share F 90.0 1179.0 213.2 321.2 
Coca Cola S 70.1 974.5 967.0 511.2 
General Electric S 53.1 11512 1014.2 293.0 
General Motors S 50.8 558.6 507.9 141.5 
FT 100-share S 58.3 1055.4 1365.6 150.8 
FT 100-share F 354 1293.3 763.7 238.0 
Glaxo S 45.0 1059.8 894.1 282.9 
Marks & Spencer S 378 241.5 417.8 72.0 
Shell S 43.0 594.7 653.7 130.0 
Nikkei 225-share S 71.1 29503 380.6 1653.9 
Treasury bonds F 412 926.8 643.4 371.6 
3-month sterling bills F 76.0 991.0 75.1 866.5 
DM/$ F 493 314.6 236.3 85.5 
Sterling/$ F 387 433.4 508.3 151.5 
Swiss franc/$ F 45.7 262.6 206.5 104.9 
Yen/$ F 37.1 289.9 112.0 195.4 
Gold F 67.9 1999.0 1139.4 884.8 
Corn F 840 238516 4785.0 1303.3 
Live cattle F 464 1128.6 1152.2 513.8 


The numbers tabulated are the series length multiplied by the sum of squared autocorrelations, 
summing across the first thirty lags. The null hypothesis of i.i.d. returns is tested by comparing 
the tabulated numbers with PON The null hypothesis is rejected at the 5% level if Q > 43.77 
and at the 1% level if Q > 50.89. 
more than two standard errors (0.04) from zero and hence are significant at the 
5% level. This suggests many of the series are not generated by an i.i.d. process. 
The i.i.d. hypothesis can also be tested by using the portmanteau Q-statistic of 
Box and Pierce (1970), calculated from the first k autocorrelations as 


k 
Om =n fr. (4.10) 
t=1 


The asymptotic distribution of the Q-statistic is chi-squared, with k degrees of 
freedom, when the process is i.i.d. The values of Q when k = 30 are listed 
in Table 4.9; they reject the i.i.d. hypothesis at the 5% level for 14 of the 20 
series. Thus again it appears that many of the returns processes are not i.i.d. The 
iid. hypothesis can be dealt with far more decisively, however, by testing the 
autocorrelations of transformed returns, as we will soon see. 

The standard error of an autocorrelation estimate used in tests of the i.i.d. 
hypothesis cannot be used in tests of the interesting hypothesis that returns are 
uncorrelated. An appropriate standard error when there are n observations is more 
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than 1/,/n because returns are conditionally heteroskedastic. This conclusion is 
established in the next chapter and then followed by appropriate autocorrelation 
tests of the random walk hypothesis. 


4.10 Autocorrelations of Transformed Returns 


Functions of returns can have substantial autocorrelations even though returns 
have very small autocorrelations. The evidence for such nonlinear dependence is 
obtained here by considering the autocorrelations of various powers of absolute 
returns, |r;|. We consider these autocorrelations for two reasons. The first is to 
demonstrate beyond doubt that daily returns are not produced by an i.i.d. process. 
The second is to document an important characteristic of daily returns that must 
be considered when developing models, namely the third major stylized fact: 


3. There is positive dependence between absolute returns on nearby days, and 
likewise for squared returns. 


Autocorrelations are discussed for time series (|r; |^] with a particular interest in 
the two cases A = 1, 2. These series are observations from an i.i.d. process when- 
ever the returns come from an i.i.d. process. Consequently, the same large-sample 
theory is applicable to the transformed series when testing the i.i.d. hypothesis, 
providing the moments E[|r;|?^] are finite. Granger and Andersen (1978) sug- 
gested there would be informative results when A = 2, which was confirmed 
in Taylor (1982b) and followed by numerous results for the powers A = 1,2 in 
Taylor (1986). All positive numbers 4 are discussed by Ding, Granger, and Engle 
(1993). 

The limit of the autocorrelations as à — O is important when developing 
and estimating certain volatility models, first noted by Scott (1987). The limit is 
the same as the autocorrelations of log(|r;|) when considering random variables 
having continuous density functions, since (|r|^ — 1)/A — log(|r|) as 4 — 0. The 
limit cannot be calculated from data when some returns are zero. To avoid this 
difficulty we calculate the autocorrelations from the following transformation 
of mean-adjusted returns, which defines “logarithmic absolute returns": /; = 
log(|r, — r|). The autocorrelations of the series {/,} will be interpreted as the 
appropriate numbers for the special case A = 0. When A is positive it makes very 
little difference whether or not the returns are mean-adjusted. 


4.10.1 LagsltoS5 


Summaries of the autocorrelation estimates for the series {|7;|}, (r2), and {4} 
are respectively presented in Tables 4.10, 4.11, and 4.12. Each table presents 
100 estimates for the first five lags of twenty series. Almost all the estimates are 
positive: 100 for absolute returns, 100 for squared returns, and 98 for logarithmic 
absolute returns. A substantial majority of the estimates exceed 0.04 and hence 
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Table 4.10. Autocorrelations for absolute returns. 


Lags 1-5 Lags 1-30 
Autocorrelations Category frequency 

Series 1 2 3 4 5 123 4 5 6 
S&P 500-share S 0.159 0.195 0.230 0.126 0.209 000 3 13 14 
S&P 500-share F 0249 0286 0226 0.149 0.197 000 0 19 ll 
Coca Cola S 0.329 0.162 0.145 0.159 0.158 000 3 15 12 
General Electric S 0224 0211 0.176 0.159 0.168 000 0 12 18 
General Motors S 0204 0.161 0.146 0116 0.101 000 7 15 8 
FT 100-share S 0.298 0251 0.180 0.184 0.133 0 00 12 3 15 
FT 100-share F 0277 0.265 0205 0.150 0.160 000 3 11 16 
Glaxo S 0.247 0.163 0.138 0.150 0.160 000 2 13 15 
Marks & Spencer S 0.155 0.078 0.075 0.082 0.075 0 00 18 1l 1 
Shell S 0.196 0.161 0.159 0.117 0113 000 4 2I 5 
Nikkei 225-share S 0315 0285 0276 0277 0257 000 0 0 30 
Treasury bonds F 0.086 0.112 0.137 0.148 0.184 0 00 0 13 17 
3-month sterling bills F 0.232 0.228 0.189 0.151 0.136 000 4 13 13 
DM/$ F 0.054 0.063 0.080 0.090 0.000 00 0 10 18 2 
Sterling/$ F 0.101 0.079 0.086 0.106 0.081 000 6 20 4 
Swiss franc/$ F 0.032 0.027 0.071 0.061 0.092 0 0 1 11 16 2 
Yen/$ F 0.113 0.069 0.121 0.059 0.111 0 00 13 14 3 
Gold F 0210 0.181 0.183 0.200 0.15 000 0 0 30 
Corn F 0311 0285 0230 0255 0.289 000 0 0 30 
Live cattle F 0.083 0.156 0.141 0.143 0.121 000 0 8 22 

Autocorrelation averages and frequency totals 
Spot series S 0.236 0.185 0.169 0.152 0.154 00 0 49 103 118 
Future series F 0.159 0.159 0.152 0.137 0.151 0 0 1 47 132 150 
All series 0.194 0.171 0.160 0.144 0.152 00 1 96 235 268 
All series, crash excluded 0.115 0.107 0.107 0.113 0.113 0 0 6 130 287 177 


reject the 1.1.d. hypothesis at the 5% significance level: 98 for absolute returns, 95 
for squared returns, and 83 for logarithmic absolute returns. 

These estimates are sensitive to the inclusion of the crash month (October 1987) 
in the calculations. As the final row in each table shows, exclusion of the crash 
month (as described previously for returns) lowers the average estimates. This 
is particularly so for absolute and squared returns at lags 1 and 2. The lower 
estimates are to be expected because the crash period contains a cluster of higher 
than usual absolute returns. Removing the crash month causes the first-lag stock 
index estimates for absolute returns to decrease substantially; for the US from 
0.16 to 0.06 (spot) and 0.25 to 0.09 (futures); for the UK from 0.30 to 0.10 (spot) 
and 0.28 to 0.12 (futures); and for Japan from 0.31 to 0.25 (spot). The numbers 
of positive and significant estimates change little with the removal of the crash 
month. There are then 100, 100, and 96 positive estimates, of which 97, 93, and 
72 are greater than 2/4/n = 0.04. 

Figure 4.6 is a scatter plot of absolute returns for gold in consecutive periods. 
The correlation between |r;| and |r;+1| is 0.21 on this figure and it may not appear, 
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Table 4.11. Autocorrelations for squared returns. 


Lags 1-5 Lags 1-30 
Autocorrelations Category frequency 

Series 1 2 3 4 5 12 3 4 5 6 
S&P 500-share S 0.074 0.181 0.097 0.017 0.168 00 2 24 2 2 
S&P 500-share F 0.082 0.258 0.058 0.011 0.067 00 2 24 3 1 
Coca Cola S 0.545 0.200 0.088 0.100 0.110 00 1 22 3 4 
General Electric S 0.303 0.365 0217 0.105 0.185 00 0 12 12 6 
General Motors S 0.398 0.111 0.104 0.036 0.056 00 0 25 2 3 
FT 100-share S 0.603 0.274 0.140 0.153 0.107 00 1 15 9 5 
FT 100-share F 0.348 0.266 0.201 0.065 0.080 00 0 17 9 4 
Glaxo S 0414 0.127 0.085 0.083 0.128 00 1 11 12 6 
Marks & Spencer S 0288 0.149 0.077 0.002 0.077 00 1 18 9 2 
Shell S 0.293 0225 0215 0.089 0.113 00 0 20 5 5 
Nikkei 225-share S 0231 0.091 0.117 0.127 0.080 00 0 14 13 3 
Treasury bonds F 0423 0.131 0.111 0.123 0.197 00 0 217 ll 
3-month sterling bills F 0.054 0.115 0.059 0.014 0.020 00 3 23 3 1 
DM/S F 0.059 0.094 0.049 0.056 0.059 00 0 17 12 1 
Sterling/$ F 0.102 0.093 0.084 0.125 0.061 00 0 6 17 7 
Swiss franc/$ F 0.038 0.050 0.059 0.038 0.073 00 0 20 9 1 
Yen/$ F 0.071 0.052 0.108 0.028 0.065 00 3 22 4 1 
Gold F 0.180 0.149 0.168 0.158 0.167 00 0 08 22 
Corn F 0.405 0.309 0.199 0.219 0.321 00 0 00 30 
Live cattle F 0.088 0.170 0.160 0.151 0.135 00 0 0010 20 

Autocorrelation averages and frequency totals 
Spot series S 0.350 0.191 0.127 0.089 0.114 00 6 161 67 36 
Future series F 0.141 0.53 0.114 0.090 0.113 00 8 131 92 99 
All series 0.235 0.170 0.120 0.089 0.114 0 0 14 292 159 135 
All series, crash excluded 0.099 0.096 0.086 0.093 0.091 0 0 20 232 241 107 


at first sight, that there is any dependence between the variables. As |r;| increases, 
there is more chance of a high value of |r;;1| and this may be discerned in the 
figure. To emphasize the dependence, Figure 4.7 shows the expectation of |r;+1 | 
conditional upon observing |r;| > c for various c between 0.1 and 4%. These 
expectations are marked by diamonds and increase almost linearly with c. The 
figure also shows the expectation of rt given that r; > c, marked by squares, 
and the expectation of r;+1 given r; < —c, marked by triangles. 


4.10.2 Loge 1 to 30 


The category frequencies in Tables 4.10—4.12 summarize the estimates for the first 
thirty lags. The 600 estimates summarized in each table are almost all positive, the 
totals being 599, 586, and 586. Excluding the crash month reduces these totals 
to 594, 580, and 578. All estimates in categories five and six exceed 0.05 and 
are significant at the 1% level for tests of the i.i.d. hypothesis. The three tables 
have 503, 294, and 352 estimates in these classes compared with the 36 estimates 
for returns that fall outside the range from —0.05 to 0.05. These comparisons of 
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Figure 4.6. Gold returns: absolute values in consecutive periods. 
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Figure 4.7. Gold returns: selected conditional expectations. 
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Table 4.12.  Autocorrelations for logarithms of absolute, mean-adjusted returns. 


Lags 1-5 Lags 1-30 
Autocorrelations Category frequency 

Series 1 2 3 4 5 123 4 5 6 
S&P 500-share S 0.005 0.004 0.040 0.034 0.083 00 0 19 11 0 
S&P 500-share F 0.024 0.068 0.077 0.063 0.090 00 0 6 24 0 
Coca Cola S (0.37 0.085 0.049 0.068 0.0833 00 0 2 24 4 
General Electric S (0.078 0.073 0.060 0.070 0.081 00 0 10 20 0 
General Motors S (0.057 0.069 0.065 0.080 0.047 00 1 21 8 0 
FT 100-share S 0.039 0.073 0.043 0.055 0.056 00 1 19 10 O 
FT 100-share F 0.071 0.068 0.059 0.036 0.061 00 0 16 14 0 
Glaxo S 0.095 0.087 0.062 0.070 0.076 00 0 11 19 0 
Marks & Spencer S 0.060 0.012 0.046 0.043 0.028 00 5 22 3 0 
Shell S 0.057 0.059 0.053 0.041 0.036 00 0 22 8 0 
Nikkei 225-share S 0178 0.201 0.177 0.171 0.174 00 0 0 0 30 
Treasury bonds F 0.034 0.062 0.071 0.094 0.081 00 0 5 25 0 
3-month sterling bills F 0.178 0.168 0.145 0.157 0.131 00 0 O 18 12 
DM/$ F 0.022 —0.004 0.057 0.042 0.058 00 4 21 5 0 
Sterling/$ F 0.048 0.053 0.055 0.048 0.067 00 0 19 11 O 
Swiss franc/$ F 0.019 —0.020 0.026 0.035 0.066 00 2 23 5 0 
Yen/$ F 0.085 0.050 0.076 0.081 0.076 00 1 17 12 O 
Gold F 0.120 0.115 0.122 0.136 0.110 00 0 O 10 20 
Corn F 0.170 0.150 0.151 0.139 0.169 00 0 0 4 26 
Live cattle F 0.054 0.067 0.109 0.066 0.076 00 0 123 6 

Autocorrelation averages and frequency totals 
Spot series S 0.078 0.074 0.066 0.070 0.074 0 0 7 126 103 34 
Future series F 0.075 0.071 0.086 0.081 0.089 0 0 7 108 151 64 
All series 0.077 0.072 0.077 0.076 0.082 0 0 14 234 254 98 
All series, crash excluded 0.062 0.059 0.063 0.064 0.069 0 0 22 307 202 69 


significant autocorrelation estimates demonstrate the validity of the third stylized 
fact—there is far more linear dependence among the transformed returns than 
among the returns themselves. 

The abundant numbers of significant estimates for the transformed series inevit- 
ably ensure that the Q-statistics are very large when compared with the appropriate 
x? distribution. Table 4.9 shows values for absolute returns, 


30 
Qr0,r) =n} be 
30, |r| =n Dr 


fed 


and the other transformed series. The values of Q are always more than 240 for the 
series of absolute returns so the 1.i.d. hypothesis must be false. Furthermore, the 
differences 039 |; — Q30,, all exceed 200, emphasizing that there is far more linear 
dependence in the process generating transformed returns than in the process for 
returns. 
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Figure 4.9. S&P 500 spot autocorrelations. 


Figures 4.8-4.11 are plots of the autocorrelations of returns (marked with 
squares), absolute returns (triangles, joined by solid lines), squared returns (dia- 
monds) and logarithmic absolute returns (circles). The gold estimates in Figure 4.8 
show that the absolute returns and the returns respectively have the highest and 
lowest estimates for gold at all of the first thirty lags. The picture is less clear 
in Figure 4.9 for the S&P 500 spot estimates, although the absolute returns do 
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Figure 4.10. Yen autocorrelations. 


have the highest estimates for twenty-nine of the thirty lags. The yen estimates, in 
Figure 4.10, are smaller than in the other figures and the lines joining the symbols 
frequently intersect. The highest estimate is for absolute returns more often than 
for the other series. Figure 4.11 shows averages of the estimates obtained from the 
twenty series. This figure shows the averages for returns are always the average 
nearest zero. The other averages are highest for squared returns at the first lag and 
thereafter for absolute returns. 


4.10.3 Lags 1 to 625 
Ding et al. (1993) present results for the autocorrelations of |r;|* calculated from 
a series of more than sixty years of daily S&P 500 returns. Fractional powers 
of à are investigated within the range 0.125—5. The maximum correlation occurs 
when A is near 1, for at least the first 100 lags. The absolute returns have positive 
autocorrelation up to lag 2705, a remarkable result that may be largely due to 
a higher level of average absolute returns before the year 1940 than afterwards. 
A subsequent paper by Granger and Ding (1995) extends the investigation of 
fractional powers of à to commodity and exchange rate series. They show that 
the maximum correlation often occurs when A is close to 1. 

Autocorrelations up to lag 625 have been calculated for the series discussed 
throughout this chapter, to obtain further results about the power à giving the 
most linear dependence in each series |r;|^. The Q-statistic defined by 


k 


Ou =n Y Meor(ril^, Iri PP (4.11) 
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Figure 4.12. Autocorrelations of absolute and squared returns; averages across 20 series. 


has been maximized for k = 1, 5, 25, 125, 625 by calculating Q for à = 0.1, 0.2, 
0.3,..., 5. The powers Amax that maximize Q are shown in Table 4.13. Three- 
quarters of the numbers Amax are within the range 0.7—1.3 when 625 lags are used 
and the average value of Amax is then 1.01. The numbers Amax do not change much 
when the crash month is excluded, providing 25 or more lags are used. 

Figure 4.12 shows the across-series averages ofthe autocorrelations for absolute 
and squared returns up to lag 300. The figure shows that absolute returns have 
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Table 4.13. Powers that maximize Q-statistics. 


Q calculated from lags 1 to... 


Series 1 5 25 125 625 
S&P 500-share S 12 12 10 09 0S8 
S&P 500-share F 1]1 11 09 07 07 
Coca Cola S 20 18 15 10 08 
General Electric S 18 18 15 1.2 1.1 
General Motors S 21 19 14 10 09 
FT 100-share S 23 19 16 15 1.3 
FT 100-share F 16 15 12 12 10 
Glaxo S 26 21 13 10 09 
Marks & Spencer S 30 28 27 23 17 
Shell S 21 20 107 15 1.3 
Nikkei 225-share S 10 08 07 06 06 
Treasury bonds F 50 15 10 09 09 
3-month sterling bills F 0.7 0.7 07 06 0.6 
DM/$ F 17 11 12 14 1.1 
Sterling/$ F 15 16 17 17 1.4 
Swiss franc/$ F 19 13 12 12 1.2 
Yen/$ F 08 09 07 06 07 
Gold F 10 10 08 08 07 
Corn E. 22 EL L8 17 1.4 
Live cattle F 50 28 16 = 1.1 1.1 
Averages 2.08 1.58 1.31 1.13 1.01 


Averages, crash excluded 1.55 1.29 1.18 1.10 1.02 


Each tabulated number is the power that gives the most dependence in absolute returns raised 
to a positive power, with dependence summarized by the sum of squared autocorrelations, 
summing across the first 1, 5, 25, 125, or 625 lags. 


more dependence than squared returns for a considerable number of lags. It also 
shows the across-series averages are positive for many lags. The first negative 
averages for selected A are at lags 196 (A = 0.5, 1, 1.5), 168 (A = 2), and 89 
(A = 2.5). The averages are close to zero for all lags above 200 whatever the 
power A. From lags 300 to 625 a majority of the averages are negative, and this is 
aconsequence of the theoretical finite-sample bias that is described in Section 5.6. 
The slow decline shown by the averages in Figure 4.12 is typical of long-memory 
effects, but it would be wrong to conclude from the figure that the individual series 
have a long-memory property. 


4.10.4 Characteristics of Returns Time Series 


The tables and figures show that while very little autocorrelation is present in series 
of returns {r;}, substantially more autocorrelation is found in series of absolute 
returns (|r; |). The autocorrelations of absolute returns are always positive at a lag 
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of one day and positive dependence continues to be found for several further lags. 
Power transformations of absolute returns, including squared returns {r?}, also 
display positive dependence but generally to a lesser degree. These conclusions 
are characteristics of all the series of returns studied here. Most and often all of 
these conclusions apply to any long series of daily returns from a financial asset 
that is traded frequently. 

The high dependence in series of absolute returns proves that the returns process 
is not made up of independent and identically distributed random variables. It does 
not provide conclusions about the random walk and efficient market hypotheses. 
Large absolute returns are more likely than small absolute returns to be fol- 
lowed by large absolute returns, but this result alone cannot be used to predict the 
direction of price changes. Early statements of similar conclusions are made by 
Mandelbrot (1963) and Fama (1965). 

There is a simple explanation for the dependence found in series of absolute 
returns, based upon the volatility clusters noted in Section 2.4. Changes in price 
volatility create clusters of high and low volatility, that may reflect changes in the 
flow of relevant information to the market. There are then some periods during 
which returns are relatively more variable and hence expected absolute returns 
are relatively high. In other periods, returns have relatively little dispersion and 
expected absolute returns are relatively low. The clustering of expected abso- 
lute returns causes positive dependence in the observed data. This explanation is 
developed in detail in Chapters 8-11. 

Other explanations are less easy to motivate. Weekend and other calendar effects 
cannot explain the dependence in absolute and squared returns (Appendix 4.13). 
Neither can a linear, correlated process provide a satisfactory explanation (Sec- 
tion 4.11). 


4.10.5 | Consequences 


The consequences of positive dependence between absolute (or squared) returns 
on nearby days are essentially the same as the consequences of volatility cluster- 
ing. All econometric methods and financial applications that assume returns are 
1.1.d. need to be reconsidered. A few examples are mentioned here. 

One example arises when testing the interesting hypothesis that returns on dif- 
ferent days are not correlated with each other. As this version of the random walk 
hypothesis is weaker than the statement that returns are i.1.d., the standard errors 
of the estimates 0; ,. obtained from n returns need not be 1/4/n when the weaker 
hypothesis applies. Instead estimates of appropriate standard errors can exceed 
2/A/n (Chapter 5). Another example relates to estimating the density function 
of returns. The likelihood function is only the product of unconditional densi- 
ties when returns are i.i.d. As they are not, likelihood inferences and parameter 
estimates must be based upon a suitable product of conditional density functions 
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(Chapters 9—1 1). As volatility is to some extent predictable, the continuous-time 
processes required to derive the Black-Scholes and related option pricing are 
incorrectly specified. More sophisticated pricing formulae, which reflect volatil- 
ity expectations and dynamics, are required. These formulae are used by traders 
(Chapters 14—16). 


4.11 Nonlinearity of the Returns Process 


Recall that a stochastic process for returns is linear if it is possible to describe 
returns by the equation 


oo 
re =U+E +t 3 bis; 

j=l 
where u and the bj are constants and {£+} is a process of zero-mean i.i.d. ran- 
dom variables. We have already shown that this model is indefensible if all the 
coefficients b; are zero. More effort is required to reject the general linear model, 
although many readers will surely be content to accept that Figures 4.8-4.12 show 
autocorrelations that are inconsistent with linear theory. 


4.11.1 Not Linear 


Lets; = (ru — wW and assume the returns have finite kurtosis, x = E [s3] /E Lef. 
The autocorrelations of the stochastic process {s+}, denoted by p;,5, are known 
for stationary, Gaussian (and hence linear) processes to be the squares of the 
return autocorrelations ge, 1.€. Des = SE for any lag t (Granger and Newbold 
1976). Clearly, the empirical evidence is decisively against these equalities. The 
mathematics is far less tidy for non-Gaussian, linear processes. Appendix 4.14 
shows that for such processes there is a general result 
Prs = = S ot Si (4.12) 

with the positive numbers a, determined by the b; alone. Thus pz,s is a weighted 
average of Rey and o; with all weight given to the first term only when « = 3. 

An upper bound for sums of the o, can be obtained from the proportional 
reduction in mean square error obtained by the optimal forecasts for a linear 
model, denoted by 0 = {var(r;) — var(e;)]/ var(r;). It can be shown that 


0 


eo. (4.13) 


oo 
o, 20 (t>0) and ` œ 
tol 


As it is only necessary to consider leptokurtic distributions, x > 3, and 


k k 
0 
2 
2 Pr,s X max (Y Dr y =] (4.14) 
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for any positive integer k. To complete the argument against all linear processes 
it is necessary to select the number of lags k and to offer a value for 0. Further 
rigor and significance tests do not seem necessary at this point. Table 4.11 gives 
autocorrelation estimates for squared returns. These estimates are almost identical 
to the autocorrelations of mean-adjusted squared returns, which are the natural 
estimates of p; e. Inspection of Table 4.11 shows that the sample autocorrelations 
contradict the inequality (4.14) for all series if k = 5 and 0 = 0.15. The contradic- 
tions remain when the crash month is excluded from the calculations. Increasing 
k will increase the difference between the two sums in the inequality and thus 
confirm that linear processes are not appropriate. The assumption 0 < 0.15 is 
innocuous since no one has found a forecast anywhere near 15% more accurate 
than the random walk forecast for liquid traded assets. 


4.12 Concluding Remarks 


Any satisfactory statistical model for daily returns must be consistent with three 
stylized facts that are of particular importance. First, the distribution of returns 
is approximately symmetric and has high kurtosis, fat tails and a peaked center 
compared with the normal distribution. Second, the autocorrelations of returns are 
all close to zero. Third, the autocorrelations of both absolute returns and squared 
returns are positive for many lags and they indicate substantially more linear 
dependence than the autocorrelations of returns. 

Several models examined in the remainder of this book are compatible with 
these stylized facts. The density of returns is a mixture of normal densities for 
many of these models, with the mixture defined by variation in volatility. The 
dependence among absolute and squared returns is then a consequence of slow 
changes in volatility. 
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4.13 Appendix: Autocorrelation Caused by Day-of-the-Week Effects 


Suppose the process generating daily returns depends on the day of the week 
according to the following seasonal model: 


T; = Hrktorër, Mt = Mt+5, Ot = Ot+5, 


with {¢;} a stationary, zero-mean, unit-variance process. Autocorrelations calcu- 
lated from a sample of returns will depend on the calendar terms m;i, oi, 1 <i <5. 
The numerical consequences of these day-of-the-week terms are, however, small. 
Similar methods can be used to show that month-of-the-year effects are much 
smaller. 


4.13.1 Returns 


As the sample size n increases, an autocorrelation estimate Ze un defined for 
n random variables r; by 3 (rt — Elte — 7)/ X (i — F)? will converge with 
probability 1 to a limit. Estimates calculated from observations will converge to 
the same limit, which is here denoted by zr, ,.. This limit would be the population 
autocorrelation p; , if the returns process was stationary. As n increases, 


1 n 1 5 
r=- e => SZ Say, 
t= I= 


and 


5 
1 
Anc > e A. Ebr. 
fzl i=l 
Now assume e; is uncorrelated with £+}; whenever t Æ 0. Then it can be shown 
that the daily calendar effects create the asymptotic, estimated autocorrelations: 


5 5 
Tr, r = 3 ui — PL) (ite — Hee —u) 4-02, t>0. (415) 
i=l i=l 

Clearly, mz = %7+5,, for all positive t; it can also be shown that zm, = 714, 
and M2r = T3 r. 

To emphasize the small magnitudes of the numbers zr,,,, suppose the daily 
means and standard deviations equal the estimates of French (1980), tabulated in 
Section 4.5. Then (4.15) gives these values for 7r; +: 


4.14. Appendix: Autocorrelations of a Squared Linear Process 95 


4.13.2 Squared Returns 


Let 7, ,2 be the limit of estimates 6, „2 „. Also, let y; = E[2] = = u? + o? and 
y = Qu: ys)/5. Now assume the e; are i.i.d. with E[e3] = = 0 and Ele4] = = 
À. Then it can be shown that 


5 

ECKE E ES y? +AU o +A- Iof, rs 
i=l 

(4.16) 

Again using French’s estimates, (4.16) gives the following low autocorrelations 


first for the Gaussian case à = 3 and second for the leptokurtic case A = 6: 


T à —3 A=6 
A  —0.006  —0.002 
,3  —0.000 4 —0.001 
5 0.017 0.007 


1 
2 


A second example is provided by averaging the variance proportions given in 
Table 4.8 for the four currency futures series (Monday 23.3%, Tuesday 19.6%, 
Wednesday 16.8%, Thursday 18.8%, Friday 21.5%) with the mean effects as- 
sumed to be zero: 


T à —3 A7A=6 
,4 0.002 0.001 
,3  —0.005  —0.002 
5 0.006 0.002 


1 
2 


4.14 Appendix: Autocorrelations of a Squared Linear Process 


Suppose {r;} is linear with r; = u + Y bie; .;. All sums are over the EE 
subscript i in this appendix, unless stated otherwise. Assume that E Kë ] is finite 
and, without loss of generality, that the e, have zero mean and unit variance and 
also that by = 1. Let à = E [£7] and define 


= (r, — wy = M bie? ; 2) Y  bibjeie j. 


i i<j 


Then E[s;] = Dh. To find the autocorrelations of {s;} we need to evaluate 
E([s;5;+1], which can be done by remembering that the e; are i.1.d. Straightforward 
algebra eventually shows that, for all t > 0, 


2 
cov, Sie) Else] Ela = O- 3) Y H + 2(Y bibis) - 


Consequently, 


(A — 3) 3: b2b2,, 20] bibi? 
(Q— 3) b 205 Y 


prs = 
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To simplify this expression, note, first, that r; has kurtosis x related to the kurtosis 


A of £; by - ; " 
03 17/023 


when A Æ 3, second, that the autocorrelations of {7;} are 


Pur = Y bal 5. 


and, third, that the autocorrelations of the process rž = Y^ Ke are 


dus MEL Ww. 


These three observations permit simplification of the previous equation for p; ;, 
to give 


(k — Ale, +202, 
DECKER 
This is equation (4.12). To establish bounds for the nonnegative quantities œz, 


Yon. Ser. 
Sa- EN-E) 


Let 6 = {var(r;) — var(e)}/ var(r;), so 3; b? = 1/(1— 0). As 35b? > 1, it 
follows that (4.13) is correct, i.e. 


t i iets 20 
NEE | sac 


t=] 


prs = 


and so 


Part II 


Conditional Expected Returns 


5 


The Variance-Ratio Test of 
the Random Walk Hypothesis 


Comparisons between the variances of one-period and multi-period returns are 
used to test the random walk hypothesis in this chapter. This variance-ratio test 
is straightforward and often powerful for detecting departures from randomness. 
Several empirical examples are discussed, as well as theoretical properties of the 
test statistic. These properties depend on results about the distributions of sample 
autocorrelations. 


5.1 Introduction 


Chapters 5—7 cover tests about the conditional first moment properties of returns. 
These tests answer several questions, including, Are returns unpredictable? and 
Are markets weak-form efficient? Some readers may want to assume the answers 
are "yes" and to focus on understanding price volatility, i.e. conditional second 
moment properties; they should jump to Chapter 8. 

The random walk hypothesis asserts that price changes are unpredictable in 
some way. The hypothesis can be defined by considering the relevance of the 
historical record of returns for the optimal prediction of the next return. It is true 
if the optimal prediction is the same number at all times and for all possible 
histories. Our definitions of optimality are given in Section 5.2. 

Tests of the random walk hypothesis also require data, a test statistic, and 
the distribution of the statistic when the hypothesis is true. Several test statistics 
are available. Their power to identify alternatives to randomness depends on the 
statistic and the alternative. In this chapter we focus on the variance-ratio test of 
Lo and MacKinlay (1988), which is particularly powerful when the alternative is 
either trends in prices or mean-reversion in prices. Many further tests that can be 
more informative for other alternatives are evaluated in the next chapter. 

The variance-ratio test compares the variances of returns measured over two 
different holding periods, for example, one day and one week. The test statistic, 
based on the ratio of variance estimates, is defined in Section 5.3. An example of 
the test calculations is provided in Section 5.4, using Excel, followed by a dis- 
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cussion of selected test results for daily, weekly, monthly, and annual returns in 
Section 5.5. The theoretical properties of the test statistic depend on the distribu- 
tions of sample autocorrelations when returns are uncorrelated but conditionally 
heteroskedastic. The distributional results are covered in Section 5.6. 

The power of random walk tests can be increased by reducing the level of 
conditional heteroskedasticity in the data used for tests. A rescaling transformation 
of returns that achieves this is discussed in Section 5.7. The variance-ratio test 
finds more evidence against randomness after the transformation, as do many of 
the additional tests described in Chapter 6. 


5.2 The Random Walk Hypothesis 
5.2.1 Definitions 


There are several definitions of the random walk hypothesis (RWH). They state 
conditions that incorporate the idea that prices wander (“walk”) in an unpre- 
dictable (“random”) manner. 

One definition that we do not use is that returns have independent and identical 
distributions (i.i.d.). The i.i.d. hypothesis is not very relevant if we are interested 
in the predictability of returns. It will be rejected by an appropriate test if the 
conditional variances of returns have sufficient variation through time, but this 
may tell us nothing about the predictability of returns. For example, the statisti- 
cally significant autocorrelation in absolute and squared returns rejects the 1.1.d. 
hypothesis but it does not prove that returns can be predicted. Even if we test and 
reject the i.i.d. hypothesis using the autocorrelations of returns, we still cannot 
reject the hypothesis that returns are uncorrelated at the same significance level. 
This fact will be illustrated in Section 6.2. 

A more general RWH is defined by first replacing identical distributions by 
identical means and second replacing independent distributions by uncorrelated 
distributions. This gives our first definition of the random walk hypothesis: 


E|r;] = E|ris«] and Goin, pls forallt andall c > 0. RWHI 


When RWHI is true, the returns process is uncorrelated and hence the best linear 
prediction of a future return is its unconditional mean, which RWH1 assumes is 
a constant. Linear predictors of re) are defined by 


oo 
fii oec KS Biri-i. 
i=0 
RWH I implies that the mean squared forecast error, Er) — fi], is mini- 
mized by the constant predictor given by setting o = E[r;+1] and all 8; = 0. 
Definition RWHI1 can be found in the pioneering text by Granger and Morgen- 
stern (1970). The assumptions are weak. For example, it is not even assumed that 
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the returns process is stationary. Further assumptions are required if the sample 
autocorrelations of returns are used to test RWHI. Lo and MacKinlay (1988) list 
three further assumptions that permit asymptotic tests and that are satisfied by a 
very general category of conditionally heteroskedastic processes. 

The definition RWHI does not exclude the possibility that a nonlinear predictor 
is more accurate than the unconditional expectation. However, illustrations of this 
mathematical result do not yield plausible models for returns; an example is given 
at the end of Section 3.4. The unconditional mean is the best prediction when our 
second definition of RWH applies, namely, 


E[ri41 | t+] =u for some constant u and for all times ¢ and 
all return histories 7; = {7;-;, i > 0}. RWH2 


These conditions are the same as saying that returns have a stationary mean jz and 
that the process of excess returns, {r; — u}, is a martingale difference. Definition 
RWH2 has its origins in Samuelson (1965). 

RWH2 implies RWHI, whenever returns have finite variance. Most tests of 
the random walk hypothesis employ sample autocorrelations and are hence tests 
of RWHI. These tests reject RWH2 whenever they reject RWHI, as we assume 
returns have finite variance. The distinction between RWH1 and RWH2 is of minor 
importance to us and is generally ignored. Campbell et al. (1997, Section 2.1) also 
discuss definitions of the RWH. 

A stationary mean for returns appears in the definitions to ensure that the sam- 
ple autocorrelations are consistent estimates. Asset pricing models do not, of 
course, require expected returns to be constant through time. Some joint tests of 
time-varying expected returns and zero autocorrelation are given in Section 6.10. 
Further joint tests are possible within an ARCH-M framework and are discussed 
in Sections 10.4 and 10.5. 


5.2. Random Walks and Market Efficiency 


Tests of the random walk hypothesis can provide insight into issues of market 
efficiency. Nevertheless, random walk tests should not be considered to be tests 
of the weak-form efficient market hypothesis (EMH). 

First, consider the situation when the RWH is false. The EMH can then be true, 
for some definitions of market efficiency, or it too may be false. Prices can fully 
reflect the information in past prices, and thus the EMH holds, as defined by Fama 
(1976, 1991), when the RWH is false. For example, conditional expected returns, 
E([r;+1 | I], could depend on previous returns because the asset's risk premium 
follows a stationary, autocorrelated process. Or E[r;+1 | Z] could be a function 
of the conditional variance, var(r;,.; | J;). These expectations and hence returns 
are autocorrelated for the ARCH-M models introduced in Section 9.5. Another 
possibility is that some linear predictor is more accurate than prediction using 
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a constant value but transaction costs exceed gross, risk-adjusted payoffs from 
trading. Then the EMH holds, as defined by Jensen (1978), yet the RWH is false. 
For example, returns could follow an MA(1) process with the moving-average 
parameter so close to zero that net trading profits are impossible. Efficiency might, 
however, be defined as a fair game for excess returns (LeRoy 1989) and then the 
EMH will be false whenever expected returns are constant and RWH2 is false. 

Second, consider the situation when RWHI is true. Then there may exist a 
nonlinear predictor which is more accurate than prediction using a constant value 
and, consequently, (1) RWH2 is false, (ii) the EMH can be false using the LeRoy 
definition, (iii) the EMH can be false for the Jensen definition when trading 
costs are sufficiently low, and (iv) the EMH can be false for Fama's definition 
as Jensen inefficiency implies Fama inefficiency. The existence of a successful 
nonlinear predictor when RWHI is true is, however, a theoretical possibility which 
is unlikely to have practical relevance. 


5.3 Variance-Ratio Tests 
5.3.1 Theoretical Motivation 


The variance of a multi-period return is the sum of single-period variances when 
the RWH is true. Several tests seek to exploit any divergence from this prediction, 
the most important being the variance-ratio test of Lo and MacKinlay (1988). 

To provide some intuition for the test, initially suppose that the stochastic 
process generating returns is stationary, with V (1) = var(r;). Two-period returns 
are the sum of two consecutive returns and their variance equals 


V2) = var(ri + regi) = var(ri) + var(ria) +2 cov(ry, ri) = (2+201)V (1), 

(5.1) 
with o; the first-lag autocorrelation of one-period returns. The two-period variance 
ratio is defined by 


VR(2) = —— -1- p. (5.2) 


The autocorrelation term is zero when the RWH applies and then the variance 
ratio is one. Otherwise, the RWH is false and the ratio can be either more or less 
than one. 

Next consider N -period returns for any integer N > 2. When the RWH is true, 


V(N) = var(r; + rri +++ +714N-1) 
= var(ri) + var(rici) +++ + var (rrn) = NV (1) 
and thus the variance ratio is unity for all N: 


VN) _ 


VR(N) = mT 


(5.3) 
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When the RWH is false, V (N) equals N V (1) plus the covariance terms between 
all pairs of distinct returns; thus 


N-1l N 


V(N) 2 NV(D) 27 A. wirt rj) 
i=l j=i+l 


N-1 N 
SITE KÉ SÉ (5.4) 


i=l j=i+l 
The double summation can be simplified to give the variance ratio as 


) N-1 
VR(N)=1+ 5 3 (Noe, (5.5) 

t=1 
The empirical test uses observed returns to decide if a sample estimate of 
the variance ratio is compatible with the theoretical prediction of one stated by 
equation (5.3). The test is most likely to reject the RWH when the ratio in equa- 
tion (5.5) is far from one. This happens when a linear function of the first N — 1 

autocorrelations, namely 


(N — Dp + (N = 2) pose eT 20N-2F ONE 


is far from zero. The multiplier is N — x for p. All the multipliers are positive and 
they decrease as the lag increases. A variance-ratio test is therefore particularly 
appropriate when the alternative to randomness involves autocorrelations that all 
have the same sign and that decrease as the lag increases. These properties are 
possessed by ARMA(1, 1) processes that have a positive autoregressive param- 
eter d and then p. = Aó$*, t > 1. Two important examples are described in 
Section 3.6. One example has mean reversion in prices (A < 0 and VR(N) < 1), 
while the other has trends in prices (A > 0 and VR(N) > 1). 


5.3.2 The Test Statistic 


The researcher must first choose a value for N. Indeed, the choice can appear 
to be arbitrary. Suppose a set of n observed returns has average r and variance 
VO) =% r,- ry? /(n — 1). An appropriate estimate of V (N) is 


n—N+1 
3 (otra raa — NF)? (5.6) 


t=1 


n 
(n—N)n—-N+1) 


V(N) = 
and then the sample variance ratio is 


VR(N) = vw) l 
NY (1) 


(5.7) 


104 5. The Variance-Ratio Test of the Random Walk Hypothesis 


This ratio is very similar to the linear function of sample autocorrelations, pz, that 
is suggested by (5.5): 

EE m e 

VR(N) = 1+ = 3 (N = thr. (5.8) 

r-l 

The RWH should be rejected if the sample variance ratio is significantly far from 
one. We can only decide what is significant if we have a distribution for VR(N ) 
when the RWH is true. This distribution can be obtained after making technical 
assumptions that are discussed later, in Section 5.6; it is not necessary to assume 
the returns process is stationary. An estimate of the variance of VR(N ) follows 
from (5.8) and estimates of the variances of the first N — 1 autocorrelations 6,. 
An appropriate estimate of n var(6;) is provided by 


n Y pi 
(Yu 5)? ; 


and then an estimate of n var(VR(N )) is given by 


b.- with s; = (r; — EI. (5.9) 


4 N54 
UN = ug AN 7 be. (5.10) 
t=1 
A useful and very accurate approximation to b; can be calculated from the kurtosis 
of returns, k, and the autocorrelations A, of the terms s; = (r; — ry: 


b, Zl k- Dëss, (5.11) 


The above estimates are consistent when the RWH is true; they then converge to the 
parameters that they estimate as n increases. Finally, the standardized distribution 
of the sample variance ratio, 
VR(N)-1 
ZN m o—————— 
~y UN/n 
is approximately the standard normal distribution when the RWH is true. This is 
an asymptotic result, so the approximation becomes perfect as n — oo. 

Comparisons of zy with NO, 1) provide satisfactory results when n/N is 
“large” but they can be unsatisfactory when n/N is “small.” Problems are only 
likely to arise when returns are recorded monthly or less often. The asymptotic 
result should certainly be applicable when n/N > 100. Further details of the 
derivation and reliability of variance-ratio tests are given by Lo and MacKinlay 
(1988, 1989), Cochrane (1988), and Campbell et al. (1997). 

Tests are often performed for more than one value of N. It is then necessary to 
avoid using hindsight to find the most interesting value of N. Chow and Denning 
(1993) discuss this problem and show how to perform a test by using the maximum 
value of |z y |. 


(5.12) 
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Nonparametric variations of the variance-ratio test can also be performed. These 
tests are evaluated using the ranks and signs of returns, and are investigated by 
Wright (2000) and Luger (2003). 


5.4 An Example of Variance-Ratio Calculations 


Itis easy to calculate variance ratios and their RWH test values. We now provide 
our first example of calculations using elementary software, namely Excel. There 
are, of course, more efficient ways to organize the calculations, for example, by 
using Visual Basic functions, but these require some programming skill. This 
section can be skipped by readers who are not interested in software. 

Exhibit 5.1 shows calculations for the S&P 100 index from January 1991 to 
December 2000. The most important Excel formulae used are listed in Table 5.1. 
The time series is defined and graphed in Section 2.2. It contains 2531 index levels 
p: that are located in cells B12 to B2542. The results are obtained when N — 5. 

The first steps are to calculate the returns r;, ignoring dividends, and a few 
summary statistics. To avoid small numbers, we work with percentage returns and 
henceforth drop the percentage adjective. The return for 3 January 1991, which 
is period t = 1, is obtained by inserting into cell C13 the formula =100*LN(B 13/ 
B12). The remaining returns are obtained by selecting and copying cell C13, 
followed by pasting it into cells C14 to C2542. The number, average, and kurtosis 
of the returns are required and can be found in cells B3 to B5, making use of 
the functions COUNT, AVERAGE, and KURT. It is necessary to add three to the 
statistic supplied by KURT as Excel calculates the excess kurtosis. 

Next, columns D, E, and F are respectively filled with the terms r; — F, s; = 
(r4 — r), and r; tra. +- -+rr+N-1 — NT. After typing formulae into cells D13 
and E13, namely =C13-$B$4 and =D13*D13, columns D and E are completed 
by copying and pasting the 1 x 2 rectangle C13:D13. The first five-period excess 
return, rj +72 +---+7r5 — 5r, is given by applying the SUM function to cells 
D13:D17, to give the result in cell F13. The remainder of column F is given by 
copying and pasting cell F13, stopping at cell F2538, which corresponds to period 
t=n—N+1. 

Now we can calculate the sample variance ratio. The one-period sample vari- 
ance Ya» is given in cell E3 by the function VAR. The estimate VON ) of the 
N-period variance in cell E4 uses equation (5.6) and the function SUMSQ, which 
here adds the squares of the N-period excess returns. The sample ratio VR(N ) 
follows immediately in cell E5, and on this occasion equals 0.9050. 

To calculate the test statistic zy defined by equation (5.12), we must first find 
estimates b, of n var (ôr), for 1 < v < N — 1. These are calculated here by using 
equation (5.11). A value for the autocorrelation in that equation can be obtained 
from the Excel function CORREL applied to two ranges containing the terms 
s; = (r, EH. For t = 1, the first range covers times 1 to n — 1 and the second 
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Table 5.1. Formulae used in the variance-ratio spreadsheet. 


Cell Formula Note 
B3 =COUNT (C13:C2542) a 
B4 =AVERAGE (C13 : C2542) a 
B5 =KURT (C13 :C2542)+3 a 


C13 =100*LN(B13/B12) 
D13 =C13-$B$4 


E3 =VAR (C13 : C2542) a 
E4 =B3*SUMSQ (F13 : F2538)/((B3-E2)*(B3-E2+1)) a 
E5 =E4/(E2*E3) 

E6 =SUM(I2:15) b 


E7 =(E5-1) /SQRT (E6/B3) 

E13 =D13*D13 

F13 =SUM(D13:D17) b 
H2 =1+($B$5-1) *CORREL ($E$13:E2541, E14: $E$2542) 
H3 =1+($B$5-1) *CORREL ($E$13:E2540,E15:$E$2542) a 
I2 =H2* (2* ($E$2-G2)/$E$2)°2 


Di 


Notes: a, assumes n — 2530; b, assumes N — 5. 


covers times 2 to n. The formula for b4, in cell H2, can be copied down column H 
and then edited to give the correct formulae for the other estimates b+. Column I 
contains the values of A(N — z)? b, /N ? whose sum defines vy in equation (5.10). 
The quantity vy can be found in cell E6. Finally, the test statistic zw is calculated 
from n, VR(N ) and vy, here giving the value —1.406 in cell E7. 

These calculations rely on the correlation function CORREL, which gives num- 
bers different from but very similar to the sample autocorrelations defined by 


36-362 -9 / Wor -5 


5.5 Selected Test Results 
5.5.1 Daily Returns 


Some results are discussed for 22 series that each contain ten years of daily returns. 
The ratios of N-period to one-period variances are tested for three values of N. 
When N = 2, the test statistic z2 gives the same results as the standardized first- 
lag autocorrelation, 91 /4/b1/n, from equations (5.8), (5.10), and (5.12). When 
N = 5and N = 20, the test statistics respectively compare approximations to 
weekly and monthly variances with daily variances. 

First consider two series of returns, from the S&P 100 index and the spot DM/$ 
exchange rate, both from 1991 to 2000. Their variance ratios VR(N ) are 


S&P 100 index 0.976 0.905 0.759 
Spot DM/$ 1.018 1.042 1.036 
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Figure 5.1. S&P 100 autocorrelations. 


" _. Returns 
e Rescaled 
----- 95% lower 
Ss 95% upper 


Correlation 


5 10 15 20 25 30 
Time lag (trading days) 


Figure 5.2. Spot DM/$ autocorrelations. 


The corresponding test statistics zw, defined by equation (5.12), are 


N=2 N=5 N=20 
S&P 100 index 0.73 1.41 1.76 
Spot DM/$ 0.73 0.80 0.32 


All these test values accept the RWH at the 5% significance level, because they are 
within the symmetric range that contains 95% of the probability of the standard 
normal distribution (— 1.96 to 1.96). 

The absence of statistically significant results reflects the low autocorrelations 
of the returns. Figures 5.1 and 5.2 show the values of ôr, marked using solid 
circles and joined by solid lines. Only a few of these estimates, shown beyond 
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Figure 5.4. DM/$ autocorrelation variance estimates. 


the dotted lines at +0.04, are outside the 95% confidence interval for an i.i.d. 
process, namely +1.96/,/n. Appropriate intervals are wider, however, because 
the estimated sample variances of the 6; are b. /n with b, > 1. Figure 5.3 shows 
that the values of b} commence above 2.5 for the index series and often exceed 1.5; 
the estimates are again marked by circles and are calculated from equation (5.9). 
The values of b; in Figure 5.4 are nearer one for the currency series but many of 
the estimates exceed 1.25. 

Next consider the twenty equity, currency, and commodity series defined by 
Table 2.2, for which autocorrelations were documented in Section 4.9. Their test 
statistics zy are listed in Table 5.2. Six of these series reject the RWH at the 
596 significance level when N — 2. AII the significant values indicate positive 
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Table 5.2. Variance-ratio test values. 


Test statistic 


Returns 
z2 £5 £20 


Rescaled returns 
£2 £5 £20 


Series from 1991 to 2000 


S&P 100-share S —0.73 —1.41 —1.76 0.17 —0.92 —1.58 
DM/$ S 0.3 0.80 0.32 1.59 1.55 1.41 
Twenty further series 

S&P 500-share S 4.00* 2.66* 0.62 5.20* 4.46* 1.90 
S&P 500-share F —1.40 —143 —1.75 —0.75 —0.68 —1.02 
Coca Cola S —124 —2.33* —2.05* 0.16 —0.85 —1.06 
General Electric S —0.92 —1.93 —1.27 —0.73 —0.82 —0.83 
General Motors S 0.57 —129 —0.75 1.34 —0.20 0.16 
FT 100-share S 2.51* 1.50 1.68 3.80* 3.57* 4.30* 
FT 100-share F —0.47 —123 —0.51 0.72 025  213* 
Glaxo S 3.56* 1.85 0.48 5.88* 4.13* 2.24* 
Marks & Spencer S 1.96* 0.40 —144 2.80* 1.54 —0.22 
Shell S 234* 2.52* 0.34 4.68* 4.10* 1.13 
Nikkei 225-share S 1.83 001 0.46 3.57* 2.69* 3.76* 
Treasury bonds F 0.1 0.47 0.48 1.73 0.91 1.46 
3-month sterling bills F 1.21 —0.30 0.40 4.91* 4.54* 4.70* 
DM/$ F —0.01 0.04 1.09 0.78 1.48 3.19% 
Sterling/$ F 1.14 023 0.46 1.00 0.71 1.96 
Swiss franc/$ F —0.55 —0.57 0.49 —0.09 —0.10 1.43 
Yen/$ F —0.01 0.55 "em 0.22 0.94  3.36* 
Gold F —1.88 —0.35 —0.47 —0.82 0.13 0.28 
Corn F 2.94* 1.82 2.33% 4.70* 3.14* 4.22* 
Live cattle F 0.56 0.94 —0.00 0.52 1.08 0.24 


The crash week, commencing on 19 October 1987, is excluded from the time series. Stars 
identify test values that reject the RWH at the 5% level, for two-tailed tests. The test statistics 
are defined by equation (5.12). 


dependence, because z2 > 1.96, and five of them are for equity series, including 
the spot returns for the S&P 500 and the FTSE 100 indices. These results show 
that some series have significant dependence between consecutive returns. There 
are only three rejections when N = 5 and also three when N = 20. Overall, these 
tests do not provide much evidence against randomness. 


5.5.2 Weekly Returns 


Lo and MacKinlay (1988) [LM] focus on weekly returns from US indices, port- 
folios, and firms between 1962 and 1985. Campbell, Lo, and MacKinlay (1997) 
[CLM] update many of the calculations by extending the sample to 1994, to 
provide series of 1695 weekly returns. 

CLM first consider equal and value-weighted indices calculated by pooling 
returns from the New York (NYSE) and American (AMEX) stock exchanges. 
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These market indices provide the following variance ratios for the four values of 
N that they used: 


N=2 N=4 N=8 N=16 
Equal weighted 1.20 1.42 1.65 1.74 
Value weighted 1.02 1.02 1.04 1.02 


All of these variance ratios imply positive dependence, which is obviously more 
pronounced when the portfolio weights are all equal. The test statistics zy are as 
follows: 


N=2 N=4 N=8 N=16 
Equal weighted 4.53 5.30 5.84 4.85 
Value weighted 0.51 0.30 0.41 0.14 


The RWH is therefore rejected at very low significance levels by the equal- 
weighted index. The data of CLM accept the RWH for the value-weighted index, 
in contrast to the subset studied by LM that rejects at the 5% level for three of the 
four values of N (zy = 2.33, 2.31, 2.07, 1.38). 

Comparing the equal and value-weighted results, portfolios of smaller firms 
are anticipated to have more dependence than those for larger firms. CLM sort all 
firms by market capitalization into five size portfolios and then calculate further 
variance ratios. These exceed one but accept the RWH for the portfolio of large 
firms. They are much larger for the other portfolios and reject the RWH, with 
zn > 4 for medium-size firms and zy > 7 for small-size firms. 

The rejections of the RWH for portfolio returns may merely reflect stale prices. 
This possibility is supported by CLM’s results for individual firms. The variance 
ratios for the 411 firms whose securities were traded throughout their sample 
period are usually near one and their averages across firms are less than one. Thus 
the positive dependence in portfolios is not found at the level of the individual firm. 
Many studies have attempted to say why this has occurred. See, for example, the 
collection of papers by Lo and MacKinlay (1999) and the references and methods 
of Boudoukh et al. (1994) and Ahn et al. (2002). 


5.5.3 Monthly Returns 


Poterba and Summers (1988) include comparisons of monthly and annual vari- 
ances in their Tables 2 and 4, so then N — 12. They tabulate the reciprocal of the 
standard variance-ratio definition, so we consider the inverses of their numbers. 
For US market returns in excess of the risk-free rate, from 1926 to 1985, the 
variance ratio from the value-weighted index is VR(12) = 1.31 with a similar 
ratio of 1.27 for the equal-weighted index. The market indices for many other 
countries, from 1957 (or earlier) until 1986, have values of VR( 12) that exceed 
unity. These ratios include 1.20 for the UK (from 1939 to 1986), 1.14 for France, 
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1.64 for Germany, and 1.15 for Japan. Thus there has been evidence for positive 
dependence in monthly market returns, although it appears that many of the test 
values do not reject the RWH at conventional significance levels. 


5.5.4 Annual Returns 


Poterba and Summers (1988) investigate the evidence for mean-reversion in 
annual stock returns by studying a modified variance ratio. This is calculated 
from monthly returns as 

Y 02)/12 

It is similar to the usual ratio VR(N ) calculated from annual returns, when N = 
j/12. The modified ratio is less than one when j > 36,i.e. N > 3, for US market 
indices from either 1871 or 1926 until 1985. Forthe sixty-year period commencing 
in 1926, the six-year variance ratios are 0.78 and 0.65, respectively for value- 
and equal-weighted excess returns. The small number of nonoverlapping N-year 
returns inevitably makes it difficult to find statistically significant variance ratios. 
Subsequent research by Kim, Nelson, and Startz (1998) and others shows that the 
deviations of the annual variance ratios from unity could well be compatible with 
the RWH. Related literature on tests for long-horizon returns includes Richardson 
and Stock (1989), Kim, Nelson, and Startz (1991), Richardson (1993), and Daniel 
(2001). 


VR(G,12) = 


5.5.5 Markets Worldwide 


Variance-ratio test results have been reported for very many markets and a few 
studies of daily and weekly returns are noted here. Liu and He (1991) and Luger 
(2003) both test five major exchange rate series, while Lee, Pan, and Liu (2001) 
investigate nine Australasian FX series. Peterson, Ma, and Ritchey (1992) cover 
seventeen commodity markets. Poon (1996) provides detailed results for the UK 
stock market for indices, portfolios, and individual stocks. Yilmaz (2001) gives 
results for twelve emerging markets (primarily in Latin America and Asia), the 
US, and Japan, while Gilmore and McManus (2003) cover three Central European 
markets. 


5.6 Sample Autocorrelation Theory 


The variance-ratio test and many other random walk tests rely on theoretical 
results about the distributions of a set of sample autocorrelations. These results 
can be skipped by readers who are not interested in theoretical results. 

The sample autocorrelations are defined as 


b= Din - neas -n/ Dor - P. p—-) 613) 
t=1 ¿=l 1 
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We suppose the terms r; and 6, are random variables in this section. Note that 
the sample autocorrelations are biased. Their sum across all lags is always 


(5.14) 


This shows that E[6,] Æ px in general. Providing p. converges as n increases, 
the bias is of order 1/n. The magnitude of the bias depends on the lag. Processes 
with high positive values of 3 p. have the most bias. 


5.6.1 Results for I.I.D. Processes 


When the r; are i.i.d. and have finite variance, the asymptotic distribution of J/n Ôr 
is normal with zero mean and unit variance for all positive t. Also, 6; and fe are 
asymptotically independent whenever t Æ &. It is not necessary to assume that 
the process is Gaussian (Anderson and Walker 1964). 

The finite-sample expectation of ô, is proportional to the number of terms in 
its numerator, and thus 

^ n—t 
E[6:] = EES (5.15) 

from (5.14). Also var(6,) is less than 1/n, and the covariance between sam- 
ple autocorrelations at different lags is negative and of order 1/n? when n > 
2 max(r, £). All these results follow from the methods in Moran (1967). 

The bias is not always trivial when tests are performed. Consider the scaled 
sum of the first N autocorrelations: 


N 
n 
S=. > Or. 5.16 
x2, ( ) 


Then the asymptotic distribution of S is standard normal but the exact expectation 
of S is approximately —4/j/n when N « n; this is —0.3 when N = 250 and 
n — 2500. Tests based upon S are similar to the regression slope tests of Jegadeesh 
(1991) when single-period returns are regressed against N-period returns (Sec- 
tion 6.3). The obvious way to reduce the effects of the bias when testing the RWH 
is to increase each term 9, by (n — z)/(n(n — 1)). 


5.6.2 Results for Uncorrelated Processes 


Next we consider uncorrelated processes that have a stationary mean, u = E[r;]. 
The important consequence of relaxing the i.i.d. assumption is that we lose the 
general result that the asymptotic variance of dn Ge is unity. 

Estimates of the variance can be obtained when further assumptions are made, 
in addition to the random walk hypothesis. Taylor (1984, 1986) assumes the uncor- 
related process has a property of multivariate symmetry. For continuous distribu- 
tions, this property requires the multivariate probability density function of the 
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vector (7741, Pain, Fra) to depend only on the n terms lr — |, 1 <i <n, 
for all positive integers n and t. Lo and MacKinlay (1988) do not assume symme- 
try and instead add different assumptions which restrict the nonlinear dependence 
and heterogeneity in the returns process. Their mixing assumptions are defined in 
White (1984). The additional assumptions of both Taylor and Lo and MacKinlay 
permit very general forms of conditional heteroskedasticity, including calendar 
terms. The estimated variance of né, is then 


n-T n 2 
b, =n (ri — FY Gus (KN - d . (5.17) 
t=1 t=1 


This estimate is almost unbiased when the process has multivariate symmetry. 
The asymptotic variance is defined when the following limits exist and are 
finite: 


n 
cr = lim ch » n = imus — DN 
t=1 
and 
n 
2_ y = e. 
g^ — m 2L n H) i 
t=1 
The asymptotic variance is then given by 
Br = lim n var(f,) = cc/o*. (5.18) 
n—> o0 
Also, the asymptotic distribution of ne is N (0, B;) and the variables 6, and 


fg are asymptotically independent whenever t # & (Lo and MacKinlay 1988). 
If, furthermore, the process (r;) is stationary with finite kurtosis, 


k = El(n — uy GIG: — uy, 


then 
br = lim n var(r) = 1 + (k — (pe (5.19) 


with o; s denoting the autocorrelation function of the squares process defined by 
s; = (r; — HI. Realistic processes for the squares have positive autocorrelations 
and then 6, > 1. A typical kurtosis estimate is 6 and a typical autocorrelation 
for squares at a low lag is 0.15, for which 8, = 1.75. The values of 6, can be 
arbitrarily large for the volatility models described in Chapters 9-11. 


5.6.3 Autocorrelated Processes 


The asymptotic distributions of sample autocorrelations for processes that are a 
linear combination of terms from an 1.i.d., finite-variance process are described 
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by many writers, including Anderson and Walker (1964), Kendall, Stuart, and Ord 
(1983), and Brockwell and Davis (1991). The finite-sample bias for the general 
linear process is given by Lomnicki and Zaremba (1957). The implications of the 
results in these sources for correlated returns processes are discussed in Taylor 
(1982a, 1986). 

Asymptotic theory shows that the variance of nf, (defined for returns) can 
differ from one if either (1) returns are uncorrelated but squared returns are cor- 
related, thus indicating a nonlinear process, or (ii) returns are generated by a 
correlated, linear process. Conditional heteroskedasticity is sufficient to produce 
variances far from one but realistic correlations between returns are not sufficient 
to do this. Consequently, asymptotic variances for an appropriate correlated, non- 
linear process for returns will be predominantly determined by the nonlinear 
structure of the process. 


5.7 Random Walk Tests Using Rescaled Returns 


Returns do not have constant conditional variances and this is the primary reason 
for their autocorrelations having more variability than those calculated from 1.1.d. 
processes. The excess variability can often be reduced substantially if we can find 
a way to rescale returns that ensures the rescaled quantities have approximately 
constant conditional variances. 


5.71 Definition 


Rescaled returns are defined by 
Ka 
n JE. (5.20) 
with bh, a conditional variance for period t calculated from returns observed until 
period t — 1. When the RWH is true for the process generating returns we may 
also expect the hypothesis to be true for the process generating rescaled returns. 
When the RWH is false, the autocorrelations of the random variables that generate 
returns and rescaled returns can differ by important amounts. Reasons for this and 
some discussion of the implications are included in Sections 6.2, 6.9, and 6.13. 
The ARCH models defined in Chapter 9 provide a framework for the calculation 
of the conditional variances h+. Here we use a specification that has the advantage 
of simplicity at the cost of probably specifying a suboptimal model for h;. Results 
are given for exponentially weighted moving averages, parametrized by a power p, 


p=1: Wh —-y)Whc cylna rl, 
po2: Rh -ü-yhhaccy(a - 7). 
There are also two positive parameters, c and y. When p — 1, the variance 


estimate is proportional to the square of the conditional mean absolute deviation 
(Taylor 1980, 1986, 2000). When p = 2 and c = 1, we obtain the integrated 


(5.21) 
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GARCH(1, 1) model of Engle and Bollerslev (1986). The selected specification 
for h; has the advantage that it is robust against nonstationarity created by changes 
in the unconditional variances of returns. It is also robust against outliers when 
pl. 

The parameters p, c, y are selected by maximizing the log-likelihood function 
when it is assumed that returns have conditional normal distributions with means 
and variances given by r and h;. The log-likelihood function equals 


1 o 
log L = —5 Ç log(27) + 2 log(/;) + Ln "| 


It is defined for the general ARCH model in Section 9.5. For a given pair p and 
y, the optimal c makes the average of the squared rescaled returns equal to one. 
Thus the maximization only involves searching for the best y for each power p 
considered. The recursive equations must be initialized in some way. Here we use 
an appropriate average, calculated from the first twenty returns as 


20 
pP? LL typ. 
1 t 
20 — 


The log-likelihood has been maximized for the twenty-two series of daily returns 
discussed in Section 5.5. The most recent data are for two series covering the 
period from 199] to 2000. The best choice for both is p — 2, with y — 0.042 
for the S&P 100 index series and y = 0.028 for the spot Deutsche mark series. 
For the other twenty series, the optimal power p is one for eleven series; also, the 
optimal y averages 0.05 with a range from 0.02 to 0.13. 


5.7.2 Further Autocorrelation Variance Estimates 


The variances of autocorrelations calculated from rescaled returns are estimated 
by replacing returns by rescaled returns in equation (5.9). These estimates of 
n var(cor (rž, r*, Il are denoted by 57. We should expect these estimates to be 
close to one when there is no linear dependence in the squares of rescaled returns 
or, equivalently, when ARCH effects have been eliminated by the rescaling trans- 
formation. 

Figures 5.3 and 5.4 show the estimates 57, respectively for the S&P 100 index 
series and the spot Deutsche mark series, marked by squares and joined by dotted 
lines. It can be seen that the estimates for rescaled returns are much nearer to one 
than the corresponding estimates b; for returns. 


5.73 The Autocorrelations of Rescaled Returns 


Figures 5.1 and 5.2 show the autocorrelations of the rescaled returns, again joined 
by dotted lines, for comparison with the autocorrelations of the returns. The two 
sets of autocorrelations look similar, on each figure, but they differ by enough 
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Table 5.3. Autocorrelations for rescaled returns. 


Lags 1-5 Lags 1-30 

Autocorrelations Category frequency 

Series 1 2 3 4 5 12 3 4 5 6 
S&P 500-share S 0.119 0.013 0.008 —0.017 0.004 00 17 120 1 
S&P 500-share F 0.001 —0.003 0.003 —0.026 0.000 00 16 14 0 0 
Coca Cola S 0.006 —0.035 —0.003 —0.025 0.003 00 16 14 0 0 
General Electric S —0.012 —0.003 —0.005 —0.008 —0.002 0 0 17 13 0 0 
General Motors S 0.042 —0.041 0.010 —0.017 0.006 00 10 200 0 
FT 100-share S 0.084 0.014 0.025 0.031 0.028 00 13 161 0 
FT 100-share F 0.034 —0.013 0.006 0.030 0.036 00 14 160 0 
Glaxo S 0.125 —0.003 0.004 —0.007 0.033 00 14 150 1 
Marks & Spencer S 0.066 —0.004 —0.032 0.017 0.001 00 20 91 0 
Shell S (0.105 0.028 0.004 —0.07 0.008 00 15 140 1 
Nikkei 225-share S 0.078 0.000 0.008 0.011 0016 00 6 231 0 
Treasury bonds F 0.039 0.013 —0.024 —0.017 —0.008 00 11 190 0 
3-month sterling bills F 0.103 0.013 0.024 0.011 0.084 00 12 170 1 
DM/S F 0.013 0.015 0.025 —0.005 0.027 00 9 210 0 
Sterling/$ F 0.02 —0.021 —0.005 0.006 0.020 00 9 210 0 
Swiss franc/$ F —0.000 —0.013 0.017 —0.012 0.013 0 1 10 18 1 0 
Yen/$ F 0.006 0.013 0.024 0.005 0.044 00 7 23 0 0 
Gold F —0.021 0.026 0.018 —0.040 0.001 0 1 14 14 1 0 
Corn F 0.090 —0.039 0.034 0.015 0.002 00 6 222 0 
Live cattle F 0.009 —0.006 0.047 0.012 0018 00 14 160 0 

Autocorrelation averages and frequency totals 

Spot series S 0.068 —0.003 0.002 —0.004 0.011 0 0 128 1363 3 
Future series F 0.028 —0.001 0.015 —0.002 0.014 0 2 122 2014 1 
All series 0.046 —0.002 0.009 —0.003 0.012 0 2 250 3377 4 
All series, crash excluded 0.041 —0.001 0.005 —0.001 0.010 0 3 255 3309 3 


The six categories that summarize the signs and magnitudes of the autocorrelations are (1) below —0.1, 
(2) between —0.1 and —0.05, (3) between —0.05 and 0, (4) between 0 and 0.05, (5) between 0.05 and 
0.1, and (6) above 0.1. The final row of averages presents average values after all returns in the week 
from 19 to 23 October 1987 have been excluded. 


to produce nontrivial differences between their variance ratios, as noted below. 
For the S&P 100 index data, it is seen that the returns have five out of thirty 
autocorrelations beyond the dotted lines at +1.96/,/n = +0.04, but the rescaled 
returns only have two such autocorrelations. This is a consequence of the higher 
variability of the estimates from returns, i.e. b; > b? as shown in Figure 5.3. 

A comparison of the autocorrelations for the other twenty series of daily returns 
(Table 4.8) with those for rescaled returns (Table 5.3) shows that the latter have less 
dispersion and are more often positive. For returns, 20 of the 600 autocorrelations 
are above 0.05 at lags 1-30, but only 11 are this high for rescaled returns; 16 are 
less than —0.05 for returns, compared with only 2 for rescaled returns. Of the 
270 autocorrelation estimates from the 9 spot series at lags 1—30, there are 128 
positive autocorrelations for returns (47%) and 145 for rescaled returns (54%). 
For the 330 estimates from the 11 futures series, the positive estimates number 
180 for returns (55%) and 206 for rescaled returns (62%). 
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Figure 5.5. First-lag autocorrelations. 
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Figure 5.6. Autocorrelations for rescaled returns from three spot indices. 


Figure 5.5 compares the first-lag estimates for returns and rescaled returns. 
Eighteen of the twenty circles are above the dotted line and hence indicate higher 
estimates from the rescaled data. The higher estimates demand an explanation. 
The simplest has two parts: first, the RWH is false for several series and, second, 
rescaling the returns magnifies the predictable component. The magnification 
effect is demonstrated for suitable autocorrelated processes by Monte Carlo and 
theoretical methods in Sections 6.9 and 6.13 respectively. 

Figure 5.6 displays the autocorrelations of rescaled returns calculated from 
spot stock indices. The S&P 500, FTSE, and Nikkei series all have a substantial 
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Figure 5.7. Average autocorrelations for rescaled returns. 


positive estimate at the first lag, followed by much smaller estimates. Almost all of 
the index estimates from lag two onwards are within the dotted lines, at +0.04, so 
that tests based upon only one of these estimates will probably accept the RWH. 
The Nikkei estimates, shown by circles, are positive for all of the first 12 lags. The 
Nikkei total of 24 positive estimates at lags 1-30 compares with 17 positive FTSE 
autocorrelations (diamonds) and only 13 positive S&P autocorrelations (squares). 

Figure 5.7 emphasizes the generally positive and very small dependence in the 
rescaled returns for futures series. The average estimates for the eleven futures 
series (marked by circles) are positive for 13 of the first 15 lags. The futures 
averages are greater than or almost equal to the averages for the nine spot series 
(marked by squares) for all lags from 2 to 15 inclusive. 


5.7.4 Variance-Ratio Test Results for Rescaled Returns 


The variance ratios for the rescaled returns are as follows for the two most recent 
data series, with the ratios for the returns shown in brackets: 


S&P 100index 1.004 (0.976) 0.957 (0.905) 0.840 (0.759) 
Spot DM/$ 1.034 (1.108) 1.069 (1.042) 1.143 (1.036) 


The corresponding random walk test statistics zy also change: 


N=2 N=5 N =20 
S&P 100 index 0.17 (—0.73) 0.92 (—1.41) 1.58 (—1.76) 
Spot DM/$ 1.59 (0.73) 1.55 (0.80) 1.41 (0.32) 


None of these test values rejects the RWH at the 596 significance level. The 
rescaling of returns increases all six test values, by amounts that range from 0.18 
to 1.09. 
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Test values zy for the other twenty series of daily returns are shown in Table 5.2. 
All but one of the sixty test values is increased by rescaling and several of the 
increases are substantial. These increases reflect the changes in (at least) the first- 
lag sample autocorrelations (noted above and in Figure 5.5) and reductions in the 
standard errors of the variance ratios. Eight of the test values for rescaled returns 
are significant at the 5% level when N = 2, each indicating significant positive 
first-lag dependence. There are seven significant values when N — 5 and eight 
when N — 20. The total of 23 significant test values for rescaled returns compares 
with a total of 12 for the returns. 

Variance-ratio test statistics are tabulated for several further series of daily 
rescaled returns in Taylor (2000). All values are significant at the 5% level for 
five series of daily returns from the Dow Jones Industrial Average index, which 
together cover the period from 1897 to 1988. The five test values zy range from 
2.04 to 10.22 when N — 2, from 2.98 to 7.74 when N — 5 and from 4.05 to 6.42 
when N — 20. The test values for the series covering the latest period, from 1968 
to 1988, are 10.22, 7.74, and 4.16. All of these values reject the RWH at very low 
significance levels. 


5.8 Summary 


The hypothesis that the returns generating process has a constant mean and is 
uncorrelated can be tested by comparing variances for two investment horizons. 
The ratio of two variance estimates differs significantly from the random walk 
prediction for several time series, including equity index series. Reducing condi- 
tional heteroskedasticity in the test data enables more evidence against the random 
walk hypothesis to be found. 
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6 


Further Tests of the 
Random Walk Hypothesis 


Several test statistics are defined and evaluated for twenty time series of daily 
returns in this chapter. Significant, positive dependence is found in a majority 
of the series, and almost no dependence in the remaining series. The results are 
consistent with substantial variation in the power of the tests to detect the small 
dependence present in some of the series. 


6.1 Introduction 


A variety of random walk tests have been motivated by particular alternatives to 
randomness. These alternatives include trends in prices, mean-reversion in prices, 
cyclical patterns, long-range dependence, and chaotic dynamics. This chapter cov- 
ers several test statistics and compares their results with those of the variance-ratio 
tests described in the previous chapter. The tests are described in Sections 6.3-6.7, 
after a review of general methodological issues in Section 6.2. 

Autocorrelation tests and test statistics that are asymptotically equivalent to 
functions of autocorrelations include the first-lag and variance-ratio tests. Further 
autocorrelation tests are reviewed in Section 6.3. The list of tests begins with the 
portmanteau Q-statistic of Box and Pierce (1970), which make no assumptions 
about the alternative to randomness. It continues with a test that simply counts 
significant autocorrelations. The third test uses the T -statistic of Taylor (19822), 
which has power to detect trends and which is similar to the variance-ratio test. 
The fourth and fifth tests are respectively regressions of multi-period and single- 
period returns upon multi-period returns. These tests of Fama and French (1988) 
and Jegadeesh (1991) have the power to detect slow mean-reversion in prices. 

Section 6.4 covers tests based upon the spectral density function. The original 
motivation for these tests was an alternative to randomness that incorporates 
cyclical patterns in returns. We consider three tests, computed first from the density 
at zero frequency, second from the density at the frequency of a weekly cycle in 
daily data, and third from the number of significant peaks and troughs in a plot 
of the density (Praetz 1979). 
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Section 6.5 describes the nonparametric runs test. It is one of the oldest tests in 
the literature (Fama 1965) and has power against MA(1) alternatives. Section 6.6 
defines rescaled range tests, first developed by Mandelbrot (1972) with the inten- 
tion of discovering dependence between observations that are far apart and later 
revised by Lo (1991). Finally, Section 6.7 contains a discussion of the nonpara- 
metric BDS test of Brock, Dechert, and Scheinkman (1987), which has the power 
to detect chaotic dynamics and many other alternatives to an 1.i.d. process. 

The results from all the random walk tests applied to daily data are discussed 
together in Section 6.8 after describing all the tests. When nineteen tests are eval- 
uated for each of the twenty series, 32% of the test values reject the random walk 
hypothesis (RWH) at the 5% level. The rejection frequency rises to 65% for the 
hybrid test of Taylor (1986), which uses the first autocorrelation 04 for equity 
series and the trend statistic T for all other series. The Monte Carlo results in Sec- 
tion 6.9 show that the considerable variations in rejection frequencies across tests 
can be explained by variations in their power to detect small levels of dependence. 

Some dependence in returns can be created first by variation in equilibrium 
expected returns, second by bid-ask spreads, and third by any rules that limit 
price movements. The magnitude of the dependence is discussed in Section 6.10. 
Section 6.11 concludes the two chapters about tests of the RWH. 


6.2 Test Methodology 
6.2.1 Test Size 


A test has correct size if the significance level is the maximum probability of a 
Type I error. The Type I error in Chapters 5 and 6 is to reject the RWH when 
this null hypothesis is correct. The test size will be increased if the conditional 
heteroskedastic (ARCH) feature of the returns process is ignored. This can be 
a serious problem. For example, suppose returns are uncorrelated, the first-lag 
autocorrelation has distribution 6; ~ NO. 2/n) for a sample of n returns, we 
incorrectly assume 04 ~ N (0, 1/1), and the significance level is 5%. Then the 
probability of rejecting the RWH, when f is the (two-tailed) test statistic, is given 
by 
P(J/n|Ói| > 1.96), with /n6, ~ N(0,2). 


With Z ^ N(0, 1), this probability equals P(|Z| > 1.96/4/2) = 17%. The true 
size of the test is then more than three times the nominal size. Thus, if we test 
that returns have independent and identical distributions (1.1.d.) using the autocor- 
relations of returns, and reject i.i.d., then we cannot reject the RWH at the same 
significance level. 

There are at least three ways to eliminate the consequences of ARCH effects 
from random walk tests. First, standard errors of test variables can be estimated 
with the assumption that the RWH is true and then these standard errors can be 
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built into the test statistics. This strategy has already been seen for variance-ratio 
tests in Section 5.3. Second, returns can be transformed to rescaled returns with 
the intention of obtaining a time series for which asymptotic linear theory is more 
applicable, as in Section 5.7. This second method is compared with the first in 
Section 6.8. Third, an ARCH model can be developed that contains time-varying 
conditional means and then the RWH can be associated with parameter restrictions 
that can be tested using robust tests. The ARCH strategy is complicated by the 
necessity of modeling the conditional variance process. Some typical results are 
given in Sections 10.4 and 10.5. 

Test procedures that contain an unspecified parameter will have spurious size 
if the parameter is chosen to make the test statistic as extreme as possible. For 
example, a comparison of the variances of N-period and single-period returns 
requires selection of N. Chow and Denning (1993) document the consequences 
for test size when the most extreme variance-ratio test statistic is selected and 
they show how an appropriate test can then be performed. Likewise, Richardson 
(1993) shows that tests based on the first-lag correlation of N-period returns have 
spurious size if N is chosen to maximize the correlation observed. His Monte 
Carlo results show some evidence for dependence in annual stock returns may be 
spurious. A variation on the theme of focusing on extreme results is the confusing 
situation when some autocorrelations are significant but several are not; a selective 
choice of lags can then determine the test result. 

The choice of significance level is always arbitrary. A 5% level is used for all of 
the tests in this chapter. There is the possibility that Type I errors are more likely 
than Type II errors for long series. For example, suppose 2500 returns come from 
an MA(1) process and that the RWH is rejected when the first-lag autocorrelation 
exceeds 1.65/,/n = 0.033. Also, suppose standard i.i.d. theory applies. Then the 
probability of a Type I error when the moving-average parameter is zero is 5%, but 
the probability of a Type II error when the parameter equals 0.07 is less than 5%. 


6.2.2 Test Power 


A test has optimal power if the probability of a Type II error is minimized. The 
Type II error in Chapters 5 and 6 is to accept the RWH when the hypothesis is false. 
Optimal tests can only be designed for specific alternatives to a random process, 
for example, an MA(1) process with positive dependence. Many alternatives 
encompass too many parameters (or even model specifications) to permit the 
development of optimal tests. For example, the ARMA(1, 1) models that follow 
from price-trend models (Taylor 19822) or from mean-reversion models (Poterba 
and Summers 1988) have too many parameters for the construction of uniformly 
most powerful tests. It is necessary to aim lower and to seek tests that have 
relatively high power for plausible returns processes and parameter levels. 


124 6. Further Tests of the Random Walk Hypothesis 


d 
0.08 
First lag 


—0.06 T e 


Second lag 


—0.08 —- 


Figure 6.1. Accept and reject regions for three tests. 


Test power depends on the test statistic, the alternative to randomness, the signif- 
icance level, and the number of observations. Power values can vary substantially 
across test statistics because they reject the RWH under different circumstances. 

To illustrate some of the issues, consider the task of building a test statistic 
from the first two sample autocorrelations, 6; and (2. If we have no idea how 
returns may depart from randomness, then we may decide to reject the RWH for 
large values of Q — br + ps. If we believe the only credible alternative is an 
MA(1) process with positive dependence, then we will reject the RWH for large 
values of R = (1. We may, however, believe the alternative to randomness is 
positive dependence at two or more lags and then reject the RWH for large values 
of S = 01 + (2. Figure 6.1 shows the regions in which the three test statistics, 
Q, R, and S, accept or reject the RWH when there are 2500 observations, the 
autocorrelations have standard errors equal to 0.02 under the null hypothesis, and 
each test has a 596 significance level. In this figure, the RWH is rejected if the pair 
(01, 92) is outside the circle marked Q, or is to the right of the line marked R, or 
is above the line marked S. The number of tests that reject the RWH can be any 
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number between 0 and 3, depending on the outcome (61, 62). At the two points 
marked a, exactly one of Q and R rejects the RWH; likewise, exactly one of two 
tests rejects at the points marked b for Q and S and at the points marked c for 
R and S. It would be wrong to look at the figure and say that Q is best because 
it rejects the RWH in a larger fraction of the diagram than the other tests. The 
important criterion is the probability of being within a rejection region and these 
probabilities will vary depending on the alternative to randomness. 

Test power also depends on the strategy used to remove ARCH effects. To 
illustrate this, suppose the RWH is tested using the first-lag autocorrelation of 
returns and that 01 ~ N (p, b/n) withb > 1 aknown number that does not depend 
on p > 0. Then the RWH can be rejected at the 5% level if 01 > 1.65./b/n and 
the probability of a Type II error is 


pi(o) = 4 (1.65 — p /n/b) 


with 4 (-) the cumulative distribution function of N (0, 1). Another strategy is to 
calculate the first-lag autocorrelation of rescaled returns, of, and to hope that this 
reduces the variability of the estimate, to give of ~ N (o*, 1/n) for some p* that 
depends on p and the process that generates returns. The probability of a Type II 
error is then 


p2(p) = 4 (1.65 — p* n). 


The rescaling method will enhance the test's power if po(o) < pi(po). This 
condition is equivalent to p < o*4/b. As b > 1 and there is evidence that 
p* > p > 0 when the RWH is false (e.g. Table 5.2), it follows that rescaling 
will probably increase test power. The number of observations required to obtain 
a power of 50% is proportional to bie? If rescaling halves b and doubles p, 
then the 50% power level can be attained with only one-eighth of the number of 
observations. 


6.2.3 Multiple Tests 


Itis fairly inevitable that people will try many methods on the same data. Usually, 
the RWH is rejected for some tests but not for others. An obvious solution is 
to reduce the significance level for each test and then to reject the RWH if one 
or more tests gives a significant result. The correct way to do this is typically 
unknown because the various test statistics do not provide independent results. 
The special case when all test statistics are linear combinations of autocorrelations 
is studied in Richardson and Smith (1994b). 

A classical methodology would avoid multiple tests altogether. The researcher 
would identify a preferred alternative to random behavior, then develop a powerful 
test, and finally apply it. However, few researchers would not be interested in the 
results from other tests. A compromise methodology is followed in this chapter. 
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Favored alternative hypotheses are selected for each series and the respective 
powerful tests are given first priority in Section 6.8. 


6.3 Further Autocorrelation Tests 


Many random walk test statistics are a function of k autocorrelation estimates 
(01. 02,..., Ox} calculated from n returns. We now review several tests that sup- 
plement the first-lag and variance-ratio tests described in Chapter 5. It is supposed 
that the returns also provide estimates b; of n var (ô+) that are asymptotically con- 
sistent when the RWH is true. All the following test statistics can be easily adapted 
when rescaled returns are used instead of returns. 


6.3.1 The General Q-Test 
The Q-statistic of Box and Pierce (1970) evaluated in Chapter 4 is defined as 


k 
a2 
Qk =n KEE 
t=1 


The asymptotic distribution of Qx is x S when the returns process is i.i.d., because 
the estimates 6; are asymptotically independent. The expected value of Or 
exceeds E[ xil — k when the RWH is true and the process is conditionally het- 
eroskedastic. The Q-test is then inappropriate. A revised statistic that has a satis- 
factory size is given by 


k 
R 
O =n =. (6.1) 


This statistic does have asymptotic distribution XQ when the RWH is true. The 
revised Q-statistic will have relatively low power for specific alternatives because 
it is intended to have power for any alternative to randomness that can be detected 
using autocorrelations. 

The choice of k is arbitrary. Results are given for k = 10, 30, and 50. A single 
Q-statistic rejects the RWH at the 5% level if Qio > 18.31, Q30 > 43.77, or 
Qso > 67.50. 


6.3.2 The Number of Significant Autocorrelations 


Given many autocorrelations, some will probably appear to be significant regard- 
less of whether or not the RWH is true. Finding a few significant values does 
not tell us much, unless their lags have a theoretical explanation or some overall 
test is significant. A simple test rejects the RWH if N or more of the first k auto- 
correlations are significant at the 5% level, with N chosen to ensure the test has 
correct size. It can be assumed that the number of significant autocorrelations has 
a binomial distribution when the RWH is true. 
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Results are given later when k = 28 and N = 4. Let N, count the number of 
times we observe || > 1.964/b. /n for lags 1-28 inclusive. Then the RWH is 
rejected at the 5% level if N, > 4. 


6.3.3 The Price-Trend Test T 


The price-trend hypothesis described in Section 3.6 states that the returns generat- 
ing process has the positive autocorrelations p; = Ag‘. Taylor (19822) develops 
a powerful test of the RWH against this alternative by considering the logarithms 
of likelihood ratios for the vector 6 = (1, 02, ..., Ox)’. These are 


(A, $) = log L(A | pr = AG") — log L(à | pr = 0). 


An accurate approximation to the ratio, when A is small and the returns process 
is linear, is provided by 


k 
| — nA $" Ôr + lo 


r=! 


with Jo a constant that does not depend on 9. The RWH should then be rejected if 
k 
Tra = 5 $' pr 
t=1 


is sufficiently large. 

The number of autocorrelations k and the test parameter d must be preselected. 
Taylor (1982a, 1986) recommended the choices k = 30 and ġ = 0.92, because 
these values deliver high test power for the ranges 0.01 < A < 0.04 and 0.8 < 
$ < 0.975. These choices are retained here. Conditional heteroskedasticity can 
be accommodated by adjusting the variance of the test statistic. The asymptotic 
distribution of 

Y 3 10.9275, 
(3239 | 0.9221 p, )05 


is N(0, 1) when the RWH is true. The null hypothesis is rejected at the 5% signif- 
icance level using a one-tail test whenever T > 1.65. When the autocorrelation 
variances can be assumed to be 1/n, so b, = 1, the test statistic becomes 


T - a 


(6.2) 


30 
T = 04274 /n ) 5 0.92 pr. 


t=1 


This version of the test rejected the RWH for rescaled returns from currencies and 
commodities in Taylor (1982a, 1986). The test and several others are evaluated in 
the very detailed analysis of Japanese markets by Kariya, Tsukuda, Maru, Matsue, 
and Omaki (1995). 
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6.3.4 The Multi-Period Autocorrelation Test 


Fama and French (1988) demonstrate that mean-reverting price components will 
induce more first-lag dependence in appropriate long-horizon returns than in 
short-horizon returns. This insight motivates their random walk test. 

The theoretical correlation between consecutive j-period returns is 


J^ E 

j cov(r tis Tri i41) 
py? = corri rrj Tre traj) = >>, — YQ ^ 
i=l /=1 
where V (j) continues to denote the j-period variance. This expression simplifies 
to the following function of the first 27 — 1 autocorrelations of single-period 


returns: , f 
T Char te + Debi — Der 
2 j-2XiaG-9» 
Multi-period first-lag autocorrelations calculated from observed data are thus 
expected to be similar to a linear function of single-period autocorrelations. 
The test is implemented here by evaluating the sample estimate of the linear 
function in the numerator of d ) divided by its estimated standard error; thus, 


wr = min(r,2j — T), 
E uô 
F(j) = tel Ser L, 
Qu wib. n)? 
The test statistic F (j) has an asymptotic standard normal distribution when the 
RWH is true. An alternative implementation of the test estimates pd ) from over- 
lapping j-period returns and then divides by a consistent standard error. 

Fama and French (1988) show that the first-lag autocorrelations of 36-, 48-, 
and 60-month returns calculated from monthly returns on US stock portfolios 
from 1926 to 1985 are negative and often less than —0.3. The significance of the 
results has been debated by Richardson (1993), who investigates the distribution 
of the maximum of | pu d 

The choice of j requires an understanding of plausible parameters for mean- 
reverting models. As the test was designed for long-horizon returns, results using 
daily data are given later for j — 125, so that the test results are similar to a test 
for dependence in six-month returns. 


(6.3) 


6.3.5 A Multi-Period Regression Test 


Jegadeesh (1991) considered the general regression test of the RWH given by 
regressing the k-period return, from times f to t + k — 1, upon the latest j-period 
return, from times t — j to t. Jegadeesh assumed the alternative to the RWH is 
mean-reversion in prices and then showed that, asymptotically, k = 1 maximizes 
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the ratio of the regression slope divided by its standard error. This ensures that 
k = 1 maximizes the power of the test for sufficiently large samples (Geweke 
1981). A large value of j is appropriate when any mean-reversion in prices occurs 
slowly. 

The population regression slope from a regression of returns on j-period returns 
is f 
COV (rr, fekt: rj) = 2x o 

WA tU - De 

for a stationary process. This suggests a test based upon the sum of the first j — 1 
autocorrelations of single-period returns that defines the test statistic: 
QI ben? 
A test could also be performed by estimating the regression slope divided by 
a standard error estimate that is consistent under conditional heteroskedasticity. 
The test statistic J(j) has an asymptotic standard normal distribution when the 
RWH is true. Values of J (j) are provided later for daily returns with the choice 
j — 250. 

Jegadeesh (1991) presents test results for regressions of monthly returns on US 
stock portfolios against total returns during the previous four to nine years. Evi- 
dence for mean-reversion is found, even in the post-war period from 1947 to 1988. 


The dependence in returns, however, is entirely concentrated in the predictability 
of January returns. 


po = 


Jg) - (6.4) 


6.3.6 | Linear Test Functions 


The T-test of Taylor (19822), the VR-test of Lo and MacKinlay (1988), the F-test 
of Fama and French (1988), and the J -test of Jegadeesh (1991) are all based upon 
either an exact or an approximate linear function of k autocorrelations. The general 
linear test function is 3 ` wz ô+. The correlation between any two linear functions 
can be substantial. The asymptotic correlation, for i.i.d. data, is 


cor ( 3 crôr, Y däi = Eed SC IS II, 


The maximum correlations between the T-test statistic, with k = 30 and a = 
0.92, and the other linear functions are 0.987 when k = 25 for the VR-test, 0.91 
when k = 15 for the J-test, and 0.79 when k = 13 for the F-test. Figure 6.2 
shows the autocorrelation weights w+ for the four tests specified in the previous 
sentence when these weights are scaled to make their totals one for each test. 
The optimal selection of weights in a linear test statistic should be proportional 
to the autocorrelations expected when the returns are not random (Richardson 
and Smith 1994b; Daniel 2001). The autocorrelations are o; = Af when the 
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Figure 6.2. Autocorrelation weights for four tests. 


alternative to randomness is either trends or mean-reversion. It then follows that 
the weights w, should be proportional to $*, if a meaningful guess at the value 
of $ is possible. The T-test is then a particularly appropriate test. 


6.4 Spectral Tests 


Spectral analysis is particularly appropriate when cycles in returns are the pre- 
ferred alternative to random behavior. A typical cyclical model is 


J 
r= wt Ka cos(wjt — Bj) + €r. 
j=l 

Cycle j then has frequency c; and is repeated every 27/w; time units. The 
evidence for cycles in financial time series is not impressive and consequently 
our discussion of spectral methods is brief. Granger and Newbold (1986, Chap- 
ter 2) describe relevant spectral theory for economic series, while Praetz (1979) 
highlights practical issues when testing returns for a flat spectral density. 

The spectral density function for a covariance stationary process is here defined 
by 


o? = 
Stol = s E 2» px costo) |; 0 € o x 2, (6.5) 


T 
t=! 


with o? and p, denoting the variance and the autocorrelations of the process. 
The integral of s(@) from 0 to 27 equals ol As s(w) = s(2zt — o), it is only 
necessary to consider frequencies from 0 to zr. The density at w = 0 is only finite 
for short memory processes. 
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Suppose the spectral density function is scaled as follows: 
f (v) = 2z:s(o»)/o?. (6.6) 


Then f(@) = 1 for all w when the RWH is true. There will be peaks in a plot of 
f (œ) at the frequencies wj; for the cyclical model. If, however, the autocorrelations 
are those of an ARMA(I, 1) process, so that pr = Ag‘, t > 0, then 


AQ — 9?) 

1 — 2 cos(o) + $?' 
The function f(@) varies monotonically from f(0) = 1 + 2A9/(1 — ¢) to 
f (1) —1-—2A4/(1 + @). For the price-trend hypothesis, A is positive and the 
function is decreasing, with a single thin peak at zero frequency (w = 0) when $ 
is almost one. For the mean-reversion hypothesis, A is negative and the function 
is increasing, with f (0) = 1 — B, where B is the proportion of returns variance 
due to incorrectly interpreted information. 

To test the RWH it is necessary to first estimate f (w) and then apply some test 
for a constant spectral density. Appropriate estimates of f (c) are 


flo) =1-A+t 


(6.7) 


M 
f(@) 2 12-2 wô: cos(to) (6.8) 
t=1 
with M an increasing function of the sample size n and with the positive multipliers 
wz chosen to ensure consistent estimates of f (w). The Parzen multipliers are used 
here, defined by 


1—6)(M —)/M?, 0<1t<M/2, 
2(M — x / M3, M/2 & x € M. 


Wr = 


It is seen that spectral density estimates are linear functions of the first M — 1 
autocorrelations. Plots of the autocorrelations 01, 92, ... , ĝm—1 and the estimated 
spectral shape, f (ol, 0 < w < x, provide the same information. The spectral 
picture will be more helpful either when there are cycles or when some of the 
linear functions are particularly informative. 

Praetz (1979) shows that estimates f (œw1) and f (c2) are correlated for nearby 
frequencies, with negligible correlation when |o, — w2| > 4x / M. Consequently, 
tests are here based on f (œ) calculated at frequencies separated by 47 /M. Stan- 
dardized test statistics are given by the following equation, when the autocorre- 
lations have estimated variances b, /n, for j = 0,1,..., M/4: 


0.5 
fj = Lf (4x j/ M) = u/{4 = brn [w, cos(4x j v/ M)] ` : (6.9) 


The statistics f; can be treated as independent observations from N (0, 1) when the 
RWH is true and asymptotic theory is applicable to the sample autocorrelations. 
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Figure 6.3. Spectral density for spot S&P returns. 
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Figure 6.4. Spectral density for Treasury bond returns. 


The most plausible cyclical period is one week when studying daily returns. 
The corresponding frequency when there are no holidays is œ = 27/5 and the 
standardized spectral statistic has j = M/10. This statistic will be denoted by fy 
to emphasize the period tested. A one-tail test is appropriate. 

Another test counts the number of significant peaks and troughs in the estimated 
spectral density (Praetz 1979). Let Ns count the number of times that | f;| exceeds 
some threshold number. Then N; has an asymptotic binomial distribution when 
the RWH is true. Here we let M = 100 and test the RWH using a 5% significance 
level. It is then appropriate to reject the RWH if | f;| exceeds 1.93 for four or more 


of the twenty-six statistics f;. 


6.5. The Runs Test 133 


1.8 + 


0.4 + 
0.2 + 
0 t t t t ! t t t t | 
0 0.2 0.4 0.6 0.8 1.0 
Frequency/m 


Figure 6.5. Spectral density for DM/$ returns. 


The statistic fo is of particular interest. Any process that has an autocorrelation 
pattern similar to that of the price-trend hypothesis will have a spectral density 
function that is maximized at zero frequency. A one-tail test is appropriate. 

Figures 6.3, 6.4, and 6.5 are three examples of plots of the estimated spectral 
shape, f (œw), 0 € w < m, respectively for returns calculated from the spot S&P 
500 index, Treasury bond futures, and DM futures. The dots on these figures 
define confidence bands when the RWH is true; on average, 95% of the spectral 
estimates are within the bands when the RWH holds. There are an insignificant 
number of peaks on each of these figures, but it is instructive to note that the 
highest density estimates are at zero frequency for the two futures series. The 
other peaks can be interpreted as chance outcomes. 


6.5 The Runs Test 


As returns have a nonnormal and perhaps nonstationary distribution, nonpara- 
metric tests can be appropriate. The runs test applied by Fama (1965) is a simple 
example. It is similar to a first-lag autocorrelation test. 

A positive run is a sequence of consecutive positive returns, a no-change run is 
a sequence of zero returns, and a negative run has a similar definition. Let q; be 
the sign of the return r;, thus g; is 1, 0, or —1, respectively for positive, zero, or 
negative r;. Also, let c; be 1 if q; Æ gr41 and 0 otherwise. Then c; = 1 indicates 
that rt commences a new run and thus the total number of runs of all types is 


n—l 
C=1+ bar 
i=l 
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Suppose there are nı positive returns, 7? zero returns, and n3 negative returns in a 
series of n returns. Then the mean and variance of the random variable generating 
C, conditional upon n1, n2, and n3, are 


1 2 
E[C]=n+1-—) nj 
and 


var(C) — [Zahn En) -n° —2n yoni} [03 — m) 


when the signs q; are generated by 1.i.d. variables (Mood 1940). All the above 
summations are over j = 1,2,3. Let RWH* represent the null hypothesis that 
the q, are i.i.d. It is usual to assume that there is no difference between RWH 
and RWH*, although neither hypothesis implies the other. The statistic C has 
an approximate normal distribution, for large n. Tests can then be decided by 
evaluating 

K = (C — E[Cp/v var(C), (6.10) 
with RWH* (and RWH) rejected at the 5% level if |K| > 1.96. Trends in prices 
would give fewer runs than expected while a tendency towards price reversals 
would give more runs. 

The runs test is easy to perform and it avoids all the problems created by 
conditional heteroskedasticity. A possible handicap, however, is a reduction in 
test power due to the loss of information in the transformation from returns to 
their signs. To show this and to further understand the test, let us now assume that 
there are no zero returns. Then the total number of runs becomes 


n—l n—-i 
CH=14+5 Gan? 2100-5) aa. 

t=1 t=1 
As the average of the a is approximately zero, and we have assumed q2 = 
1, their first-lag autocorrelation is approximately 3 q;q;41/n. Therefore, C is 
essentially a linear function of a first-lag autocorrelation. However, the first-lag 
autocorrelation of the stochastic process that generates {g;} can be less than for 
(ri). For example, a zero-mean, stationary, Gaussian process has 


Pig = Elqrqt+1] = P (qt = qi) — P (qt £ qua 
= 2P (qi = qi41) — 1 = (2/2) arcsin (p1,r) 
~ 0.64p1,r. (6.11) 


We see that the runs test is like a special first-lag test that may have less power 
than a conventional first-lag test because the runs test uses less information. 

The test is problematic whenever there is thin trading in the asset, because 
there may then be several instances of consecutive zero returns. These may be 
responsible for fewer total runs than expected, thereby permitting rejection of the 
independence assumption (and RWH*) even when there is no serial correlation. 


6.6. Rescaled Range Tests 135 


6.6 Rescaled Range Tests 


Alternatives to uncorrelated processes possess some linear dependence, that can 
be classified as having either a short or a long memory. Short memory pro- 
cesses, including ARMA processes, have autocorrelations that converge rapidly 
to zero. Long memory processes, such as the ARFIMA processes mentioned in 
Section 3.8, have autocorrelations that converge at a slower rate to zero; these pro- 
cesses can have autocorrelations p; that are approximately proportional to t7¢—!, 
0 < d < 0.5, for high lags c. Range statistics that have power to detect long-term 
dependence were first developed for hydrological data (Hurst 1951) and later 
applied to financial returns (Mandelbrot 1972). Lo (1991) provides many refer- 
ences and a rigorous description of appropriate tests when the preferred alternative 
to randomness is long-term dependence. 

The range defined by a set of returns (r1, ..., r4] is similar to the range of the 
price logarithms. It is defined using partial sums of deviations from the average 
r to be 


T T 
M, = E» - d - E Xor- d 
R/S-test statistics are ranges divided by scaled standard deviations, 
E es M, (6.12) 
Kee ^ 
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where né" is a constant multiplied by some consistent estimate of the variance 


of Mn. Two special cases define Mandelbrot's and Lo’s test statistics; thus, 


6 —s defines (R/S)Man; (6.13) 
q . 
^2 2 J A 
= 1 2% ]— —— Jĝ; defi R/S)to, 
7 a 2 ze» EE 


with s? the sample variance of returns. Lo's expression for 6? is proportional to 
an estimate of the spectral density at zero frequency calculated using the first q 
autocorrelations of the returns. Under certain assumptions, the distributions of 
these statistics converge, as n and q increase, to that of the range of a Brownian 
bridge on the unit interval. We use g = 20 for our tests. 

The null hypothesis of an uncorrelated process can be tested using (R/S)Man, 
with some restrictions on moments and heterogeneity. This test has power against 
short memory dependence. Lo focuses on the null hypothesis of a short memory 
process, with explicit additional restrictions on the degree of dependence, and then 
the appropriate test statistic is (R/S)Lo. The asymptotic results are applicable to 
many stationary, ARCH processes. À two-tailed test rejects the null hypothesis at 
the 5% significance level when an R/S statistic is below 0.809 or above 1.862. 
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Lo (1991) reports results for daily returns from US stock indices and concludes 
that the returns process is characterized by short-term dependence because the 
values of (R/S')Man reject the RWH but the values of (R/S). are insignificant and 
therefore do not support long-term dependence. The same conclusion is consistent 
with the results of Greene and Fielitz (1977) and Hiemstra and Jones (1997) for 
daily returns from large samples of individual US stocks. Goetzmann (1993) 
applies rescaled range tests to annual index returns, as far back as 1700 for the 
London market, and concludes that there may be long-term dependence over very 
long horizons. 


6.7 The BDS Test 


Chaotic dynamics motivates the BDS test of Brock et al. (1987, 1996). The BDS 
test has the power to detect many alternatives to an i.i.d. process, so that it is of 
interest regardless of beliefs about chaotic effects. 

Chaotic processes are nonlinear and deterministic. They can have occasional 
large changes and periods of extraordinary volatility. Their sample autocorrela- 
tions can converge to zero for increasingly large datasets and thus the realizations 
of a chaotic process can mimic many important features of financial time series. 
Several examples are presented by Hsieh (1991), along with several results for 
the application of the BDS test to financial time series. Gleick (1987) provides 
a nontechnical introduction to chaos while Baumol and Benhabib (1989) show 
how economic models could produce chaotic dynamics. 


6.7.1 Correlation Integrals 


For a sample of n observations (x1, ..., Xn}, an embedding dimension m, and a 
distance e, the correlation integral Cm (n, £) is estimated by 


] if |x, — x| < e, 


I (Xs, xt, €) = 
Gs O otherwise, 
m-—i 
In (Xs, Xt, £) = I] I (Xs+k, Xt+k; £), (6.14) 
k=0 
2 n—mn—m+1 


Cn) == aa See 2 2 PR 
The function 7 (-) indicates whether or not the observations at times s and ¢ are near 
each other, as determined by the distance £. The product J,,(-) is only one when 
the two m-period histories (Xs, %541,---;Xs+m—1) and (xr, Xt+1, ---, Xtom-1) 
are near each other in the sense that each term x54, is near x;+,. The estimate 
of the correlation integral is the proportion of pairs of m-period histories that are 
near each other. 
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For observations from many processes, including stationary processes, we can 
define the limit 
Cn(e) = lim Cjy(n, £). 
n— oo 


When the observations are from an i.i.d. process, the probability of m consecutive 
near pairs of observations is simply the product of m equal probabilities and hence 


C, (&) = Ci(e)”. 


When the observations are from a chaotic process, however, the conditional prob- 
ability of xs+x being near zt, given that x54; is near x,4; forO < j < k, is 
higher than the unconditional probability and hence 


Cm(£) > Cy(e)”. 


The conditional probabilities can be much higher than the unconditional prob- 
abilities when m is large compared with the correlation dimension defined by 
Grassberger and Procaccia (1983), because the m-period histories of chaotic pro- 
cesses fill less space in m dimensions than i.i.d. processes (Hsieh 1991). 


6.7.2 The Test Statistic 


These properties of correlation integrals led BDS to consider the random variable 
/n(Cm(n, el — C1 (n, SI). For an i.i.d. process, the distribution of this variable 
converges to a normal distribution as n increases, with mean zero and variance 
Vin determined by the distribution of the 1.1.d. random variables. The BDS test 
statistic is given by 


Wn (e) = E (Cy (n, £) — C1 (n, &)") (6.15) 


with Vin a consistent estimate of Vm. The statistic Wa (e) is compared with N (0, 1) 
anditis conventional to perform a two-tail test, as certain (nonchaotic) alternatives 
to1.1.d. will often give negative values of the test statistic. There are many equations 
for a consistent estimator of Vm (in, for example, Hsieh 1989). Following Brock, 
Dechert, Scheinkman, and LeBaron (1996), the results here are calculated using 
m-—1 
Vs a(x" +m- 1c? -m KC”? +250 pae) (6.16) 
j=! 
with C = Cı (n, £) and 


6 
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6.7.3 Properties 


Hsieh (1991) provides simulation evidence that shows the BDS test has power to 
detect chaotic processes and many other alternatives to i.i.d. when the parame- 
ters of the alternative processes are sufficiently far from zero. These alternatives 
include linear AR(1) and MA(1) processes, processes with step changes in their 
mean and/or variance, nonlinear moving-average processes, threshold autoregres- 
sive processes, and, most important, ARCH processes. The intuition for power 
against ARCH alternatives is as follows. These alternatives display volatility clus- 
tering. Then, when a set of variables x,+; are near x;4; forO < j < k, itis more 
likely than at other times that the k pairs come from periods of low volatility. 
Volatility clustering then implies that the next pair, Dit, X;+4), is more likely to 
come from low volatility periods and thus be near than at other times. 

An extension of the BDS test that tests the null hypothesis of a linear process 
involves replacing the original data by the residuals from an estimated linear 
model. The asymptotic distribution of the BDS test statistic is unchanged by this 
filtering operation (Brock 1987). Unfortunately, there does not appear to be a 
similar result for standardized residuals from uncorrelated ARCH processes, that 
is, for returns minus their mean divided by the conditional standard deviations 
given by an estimated model. Hsieh (1991) shows that routine comparison of 
test results with a normal distribution can be inappropriate for 1000 standardized 
residuals from an ARCH process. 


6.7.4 Results 


The BDS test has often rejected the i.i.d. hypothesis for returns but the test values 
are nearly always much smaller, and frequently insignificant, for standardized 
residuals from ARCH models. See, for example, Hsieh (1989) for daily foreign 
exchange returns, Hsieh (1991) for weekly, daily, and 15-minute returns from US 
stock indices and decile portfolios, and Abhyankar, Copeland, and Wong (1995, 
1997) for 5-minute and more frequent returns from several stock indices. 

The BDS test has two parameters: the length m of the m-period histories that 
are compared and the distance measure e. It is commonplace to report a table of 
test results with m — 2,3,4,... and e equal to various multiples of the data's 
standard deviation s. Multiples such as 0.5, 1, and 1.5 are often chosen. All entries 
in the table of test values may give the same conclusion, especially when testing 
returns, but this is unlikely when testing standardized residuals. We focus here 
on test results when (i) m = 2, e = 0.75s, (ii) m = 4, € = s, and (iil) m = 8, 
e = 1.25s. The corresponding test statistics are called W2, W4, and Ws. 


6.8 Test Results for the Random Walk Hypothesis 


The RWH is now evaluated for the test statistics described in Sections 6.3-6.7 
and comparisons are made with the first-lag and variance-ratio tests defined in 
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Table 6.1. Rejection frequencies for random walk tests. (The tabulated numbers are 
the number of series, out of 20, that reject the RWH at the 596 significance level.) 


Returns Rescaled returns 
Estimates of b included in the test No No Yes Yes No No Yes Yes 
Crash week In Out In Out In Out In Out 
Column 1 2 3 4 5 6 7 8 
Test 
First autocorrelation, 61 7 10 2 6 9 8 9 8 
Number of significant autocorrelations, N» 9 6 0 0 3 1 2 3 
Portmanteau, to lag 10, Q10 12 9 2 6 8 8 6 7 
Portmanteau, to lag 30, Q30 14 11 0 0 7 7 5 6 
Portmanteau, to lag 50, Q50 15 15 1 2 4 $3 3-4 
Trend test, T 4 3 3 2 12 11 12 11 
Variance ratio, week/day, z5 9 7 0 3 d We ED 
Variance ratio, month/day, zoo D2. 5s (025. 3B 9 9 8 8 
Multi-period first autocorrelation, F (125) 2. 2 2 1 3 3 3 3 
Multi-period regression, J (250) 35 49:7 3 E 8 7 8 S8 
Spectral density at zero frequency, fo 4 2 2 2 9 8 10 8 
One-week cycle, fw 4 3 1 1 2v De S26 2 
Number of spectral peaks, Ns 7 4 0 0 2 42 2.2 
Runs test, K 9 9 
Rescaled range, (R/S)Man 6 6 
Modified rescaled range, (R/S)Lo 1 1 
BDS test, 2-period histories, W2 19 10 
BDS test, 4-period histories, W4 19 10 
BDS test, 8-period histories, Wg 20 7 


Stocks first autocorrelation, others trendtest 6 8 3 7 14 13 13 13 


Rescaled returns are returns minus their sample mean divided by an estimate of their con- 
ditional standard deviation. The tests are robust against conditional heteroskedasticity when 
estimates of bz = n var( ôr) are included in the tests. Crash week out indicates that returns 
during the crash week, commencing 19 October 1987, are excluded. The following tests reject 
the null hypothesis in only one tail: N+, Q10, Q30. Qso. T, fo. fw. Ns. 


Chapter 5. Results are discussed for the set of twenty time series introduced in 
Chapter 2. Each series contains approximately 2500 daily returns and results are 
also obtained from the rescaled returns defined in Section 5.7. 

As far as is possible, RW test statistics have been calculated from a selection 
of formulae depending on three choices: (a) either returns or rescaled returns 
are used; (b) the variances of autocorrelations at lag t are either assumed to be 
1/n or estimated to be b; /n; and (c) either all the (rescaled) returns are used or 
the crash week, commencing on 19 October 1987, is excluded. All the sample 
autocorrelations used in the tests are adjusted for the bias predicted under the null 
hypothesis by equation (5.15); the quantity (n — t)/(n(n — 1)) is added to the 
usual estimate of the autocorrelation ,. 
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Table 6.1 presents an overview of the test results for nineteen test statistics, 
with the eight columns showing how the three choices impact on the number of 
significant test results. The tabulated numbers show how many series reject the 
RWH at the 596 significance level. Only one series (on average) should reject the 
null hypothesis when it is true for all the assets and the size of the test is correct. 
The rejection frequency is clearly much higher in all columns of the table. We first 
discuss the information conveyed by Table 6.1 and follow this by a discussion of 
the results by test and then by asset. 


6.8.1 Sensitivity to Specification of the Tests 


The highest rejection frequencies are found in the first two columns, for returns 
when there are no corrections for conditional heteroskedasticity (ARCH effects). 
These high frequencies are evidence against the hypothesis that returns are 1.i.d.; 
they are not, however, evidence against the more general RWH. The rejection 
of the i.i.d. hypothesis is most decisive using the BDS test. This is an interesting 
observation, although it must be remembered that the autocorrelations of absolute 
returns also reject the i.i.d. hypothesis convincingly (see Section 4.10). 

The third and fourth columns of Table 6.1 show that the numbers of significant 
test results are much reduced when estimated variances b,/n are incorporated 
into the autocorrelation tests. This is particularly apparent for the portmanteau 
statistic Q50. This statistic rejects the i.i.d. hypothesis for fifteen of the twenty 
series but rejects the RWH for only one or two series. This happens because the 
squares of autocorrelation estimates, Dus have expectations that are often far above 
1/n because of the ARCH effects; hence Q-statistics calculated from k lags have 
expectations far above k when ARCH effects are ignored. 

Comparisons between the third and fourth columns show that some of the 
rejection frequencies are particularly sensitive to the numbers in the crash week 
when tests use returns. The seventh and eighth columns show that this effect is 
not observed for the rescaled returns. 

The four columns for rescaled returns in Table 6.1 show that the autocorrelation 
test results for this type of data are not sensitive to the other two choices. The 
estimated variances b, /n are then similar to 1/n and so either variance measure 
gives similar test results. The rescaling of returns reduces the relative magnitude 
of the equity numbers in the crash week and this explains why few test decisions 
change when the crash week is removed. 

There are many more significant results from rescaled returns than from returns 
when ARCH effects are eliminated from the tests, as can be seen by comparing 
column four with column eight. Linear dependence is more likely to be significant 
if rescaling (i) reduces the variability of autocorrelations, and/or (ii) increases the 
average estimate of the magnitude of the dependence. The first explanation is true 
for my data and the second is supported by Figure 5.5 and the Monte Carlo results 
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Table 6.2. Results of the BDS test. 


Returns Rescaled returns 
m 2 4 8 2 4 8 
e/s 0.75 1.00 1.25 0.75 1.00 1.25 


S&P 500-share S 231 487 891  —149 —128 —0.13 
S&P 500-share F 3.79 £6.60 11.89  Á —1.34 —1.43 0.35 
Coca Cola S 850 10.67 14.39 4.11 3.72 3.94 
General Electric S 666 8.73 12.54 2.08 1.23 1.06 
General Motors S 484 5812 11.20 1.50 223 2.42 
FT 100-share S 3.68 7.05 1128 —1.65 0.02 1.00 
FT 100-share F 5.69 899 1341  Á —0.68 0.46 0.86 
Glaxo S 6.93 11.37 16.86 224 2.87 3.32 
Marks & Spencer S 5.51 657 9.16 2.84 3.33 4.05 
Shell S 691 2949 1127 2.92 341 2.91 
Nikkei 225-share S 17.01 26.60 39.05 1.60 3.69 422 
Treasury bonds F 3.49 6.72 12.39 —1.95 —2.44 —0.89 
3-month sterling bills F 13.35 19.53 24.89 1.96 2.95 3.51 
DM/$ F 196 446 913 —1.52 —124 0.01 
Sterling/$ F 378 548 8.85 0.67 0.58 1.54 
Swiss franc/$ F 1.52 290 654 3 —1.11 —1.71 —0.53 
Yen/$ F 5.86 852 1342 3.35 442 6.33 
Gold F 9.66 1221 16.54 2.03 1.60 1.93 
Corn F 11.72 17.45 26.62 0.30 0.94 1.69 
F 


Live cattle 3.84 9.96 18.08  —1.97 —0.47 1.05 


m is the embedding dimension, e is a distance that is used to decide if returns are near each 
other, s is the standard deviation of the returns after excluding the crash week. The BDS 
test statistics are calculated using all the returns, including those in the crash week. The 
asymptotic null distribution of the test statistics is N (0, 1). Monte Carlo methods are used to 
obtain quantiles for tests applied to rescaled returns. The 2.5% and 97.5% quantiles used for 
the tests are — 1.25 and 2.72 when m is 2, —0.78 and 3.00 when m is 4, —0.36 and 3.39 when 
m is 8. 

in Section 6.9. LeBaron (1992) also considers the empirical relationship between 
linear dependence and volatility. 


6.8.2 Results by Test 


Now we discuss the results of thirteen autocorrelation tests evaluated from re- 
scaled returns with the crash week excluded, the runs test, two rescaled range 
tests evaluated using all the returns, and three BDS tests evaluated from all the 
rescaled returns. The results from these specifications of nineteen tests reject the 
RWH for 32% of the tests performed, at the 5% significance level. Test values are 
listed for the BDS test in Table 6.2 and for the first-lag and variance-ratio tests in 
Table 5.2. 

The final row of Table 6.1 shows the rejection frequency for the hybrid test 
recommended after the tests in my previous book (Taylor 1986, p. 171). The 
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hybrid test applied to an equity series uses the first autocorrelation 04 and for all 
other assets it uses the trend statistic T. As the final row shows, the hybrid test 
procedure rejects more often than any of the nineteen tests; it rejects the RWH 
for thirteen of the twenty series. The (1-test rejects for six of the eleven equity 
series, and the T-test rejects for seven of the remaining nine series. 

The highest rejection frequency for a single test is eleven using the trend test 
T. The next highest frequency is ten for two of the BDS tests, followed by nine 
for the runs test, and eight for the 0;-test, the twenty-day variance-ratio test, z2o, 
the zero frequency spectral test, fo, and the regression test of returns against 
lagged annual returns, J (250). However, the high scores for the BDS tests may 
be unreliable and misleading in view of the Monte Carlo results discussed in the 
next section. 

The test statistics are allocated into three sets. First, there is the T-test and 
statistics whose rejections of the RWH are frequently for series that are also found 
to be nonrandom by the T-test. The three statistics zoo, fo, and J (250), like T, 
are all linear functions of a large number of positively weighted autocorrelations, 
with the weights either decreasing or constant as the lag increases. Each of these 
three statistics rejects the RWH for eight series. They respectively reject the RWH 
for eight, seven, and six of the eleven series for which T rejects the RWH. The test 
values for T and z29 are very similar and the correlation between the twenty pairs 
of values exceeds 0.99. The rejection count for z29 increases to ten if one-tailed 
tests are performed. 

The T-test rejects for seven of the eleven futures series (including three of 
the four currency series) and all three spot stock index series, but for only one 
of the six individual stock series. The fọ- and J (250)-tests also reject for seven 
futures series, but for only the Nikkei from among the spot series. The number of 
significant autocorrelations N, can be allocated to the first set; T rejects for two 
of the three series for which N, rejects randomness. 

The second set contains the first autocorrelation statistic 6; and tests whose 
rejections of the RWH are frequently for series that are also found to be nonrandom 
by (1. The three Q-statistics have a total of seventeen rejections. The ()-test 
rejects on sixteen of these occasions. The five-day variance-ratio test, z5, rejects 
for seven series and on every occasion 9, also rejects the RWH. The runs test, K, 
rejects for nine series, six of which are also rejected by 0, and then the numbers 
of runs are less than expected for a random process. The number of significant 
spectral estimates N, and the test for a weekly cycle fy can both be allocated 
to the second set as, in each case, 0, rejects for the two series for which the test 
rejects randomness. 

The statistic 6; rejects the RWH for all three spot stock indices, all three UK 
stock series, and two of the futures series. All these rejections indicate significant, 
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positive first-lag dependence. There are no significant, negative test values and 
there are no rejections for the three US stock series. 

The remaining tests form a third set. The three BDS tests reject the RWH for 
ten, ten, and seven series; the ô- and T -tests reject the hypothesis for between 
three and five of these series. The rescaled range tests in their original form reject 
the RWH for six series with significant evidence of positive dependence found 
for all four currencies; T rejects for three of the six series and 0, for one of 
the six. Only one series rejects the RWH when Lo's modification is used so the 
dependence may be classified as short term. Finally, the test based upon the first- 
lag autocorrelation of semi-annual returns, F (125), rejects the RWH for three 
series, only one of which is rejected by either T or Ou. 


6.8.3 Results by Asset 


A simple count of how many times the nineteen tests reject randomness for each 
asset allows classification into three groups. In the first group, with the most 
rejections, are corn futures (11), three-month sterling bill futures (10), and Shell 
(10). In the second group, with between six and nine rejections, we find the spot 
stock indices and the exchange rates. The rejection counts are 9 for the Nikkei 
225 index, the yen and Glaxo, 8 for the S&P 500 index, the Deutsche mark and 
the Swiss franc, 7 for the FTSE 100 index and 6 for sterling and Marks & Spencer. 
In the third group, with five or less rejections, are the other series: all three large 
US firms (Coca Cola (3), General Electric (0), General Motors (2)) and futures 
for Treasury bonds (5), the S&P 500 (4), the FTSE 100 (2), live cattle (2), and 
gold (1). 

It is not surprising that there is more dependence in spot indices than in index 
futures because spot indices are not traded assets. The equity dependence is found 
by the 9,-test and by tests that frequently reject when p also rejects randomness. 
The currency dependence is detected by the T-test and other tests that are linear 
functions of many autocorrelations. 

There are interdependencies between the twenty time series of returns that 
imply some dependence between their test values. This dependence is potentially 
substantial for the spot indices and their futures (Ahn et al. 2002) and also for the 
four currencies. Appendix 6.12 provides theoretical results about the dependence 
of test values from a pair of cross-correlated time series when the RWH is true 
for both series. 


6.8.4 Dependence through Time 


The hybrid test procedure uses the first autocorrelation 6; for all equity series 
and the T-test for all other series. It rejected randomness at the 5% significance 
level for all forty series in Taylor (1986). Here, however, the hybrid test only 
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rejects randomness for thirteen series out of twenty. Dependence among returns 
therefore appears to be diminishing as time progresses. 

Returns on General Electric and General Motors stock from 1966 to 1976 have 
significant positive values of 0; in Taylor (1986), but not here from 1984 to 1993. 
Returns on sterling, Deutsche mark, and Swiss franc futures all have lower test 
values for T and fo here, for the period from 1982 to 1991, than for the earlier 
period from 1974 to 1981 covered in Taylor (1986). Campbell, Lo, and MacKinlay 
(1997, Section 2.8) also report a decline in dependence though time, for daily and 
weekly returns from the CRSP value- and equal-weighted indices. 


6.9 The Size and Power of Random Walk Tests 


The random walk tests have been performed with critical values obtained from 
asymptotic theory. The test results for rescaled returns use ARCH models that 
are probably mis-specified. Asymptotic theory and/or rescaling the returns might 
distort the size of a test. Monte Carlo simulations help to clarify the size of the 
tests. 

Table 6.1 shows that the rejection frequency is firstly far higher for some tests 
than for others and is secondly higher for rescaled returns than for returns, once the 
tests are adjusted for ARCH effects. These variations in the rejection rates call for 
some explanation. Monte Carlo estimates of test power for special alternatives 
to randomness show that there are stochastic processes that can explain many 
features of the observed test results. 


6.9.1 A Returns Model for Simulations 


Simulated uncorrelated returns are obtained from the product of two independent 
processes; thus, 


rt = Ot Et. 


The stochastic volatility (SV) process {o;} is defined by supposing {log(o;)} is 
AR(1) and Gaussian, with mean g, standard deviation £, and autocorrelations di. 
The variables e; are 1.1.d., standard normal variables. This SV model for returns 
is discussed at length in Chapter 11. 

Simulated correlated returns are obtained by introducing a third independent 
stochastic process {ur}, which specifies the mean component: 


T; = Ut + Oter. 


The 4; are all zero for the size calculations but are generated by some linear 
process for the power calculations. All of the nonlinear effects are then in the 
uncorrelated component o;¢;. This rather curious assumption can explain why 
rescaled returns have more dependence than returns. 
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6.9.2 Size 


Estimates of the size of the tests have been obtained by simulating series of 2500 
returns with volatility parameters o = —5.15, B = 0.422, and $ = 0.973. These 
have been realistic values for currency returns (Taylor 1994a,b). 

The first two columns of Table 6.3 show the empirical sizes of the tests when 
the significance level is 5%; the autocorrelation tests incorporate finite-sample 
bias corrections and variance estimates b; /n. First, consider all the tests except 
the BDS tests. For the 40 000 simulated series the estimated size figures have 
standard errors close to 0.1%. The size estimates range from 3.3% to 6.5%. The 
highest values are for the test statistics T and fo. These tests reject in only one 
tail. The z29 statistic is very highly correlated with T, here rejects in two tails and 
has size very close to 5%. We conclude that asymptotic theory is only approxi- 
mately valid when there are 2500 returns and the tests must be adjusted for ARCH 
effects. 

The BDS size figures are calculated from only 500 series, because calculations 
of BDS test statistics are relatively slow. All the series reject the 1.1.d. hypothesis for 
returns. The rescaled returns are closer to an i.i.d. process but the high empirical 
sizes show that the test is not reliable. This can be explained by the failure of 
asymptotic theory when volatility parameters are estimated (Hsieh 1991) and 
by mis-specification of the conditional variances when the rescaled returns are 
calculated. The critical values of the test can be adjusted to obtain a more correct 
size, but the values depend on the process assumed to generate returns and on 
the embedding dimension m. Appropriate 2.5% and 97.5% quantiles, for the 
stochastic volatility model simulated here, are —1.25 and 2.72 when m is 2, 
—0.78 and 3.00 when m is 4, and —0.36 and 3.39 when m is 8. These quantiles 
are used in Table 6.1, when counting significant results from the BDS test applied 
to rescaled returns. 


6.9.3 Power against an ARMA (1, 1) Alternative 


Test power has been estimated when the mean component {ur} is assumed to be 
AR(1) with autoregressive parameter 0.95 and with variance equal to 2% of the 
variance of the returns. These two parameters have been credible for currencies 
for long periods during past years (Taylor 1992, 1994b). The volatility parameters 
are unchanged. The simulated returns have an ARMA(1, 1) representation and 
are from a process whose autocorrelation is 0.02(0.95*) at a lag of t periods. 
Again series of 2500 returns are simulated and tests are performed using a 5% 
significance level. 

The third and fourth columns of Table 6.3 show empirical power figures cal- 
culated from 10000 series, except for the BDS tests when 500 series are used. 
Ignoring the BDS figures for returns, the highest power figures are for those test 
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Table 6.3. Percentage size and power of random walk tests for three stochastic processes. 
(Simulated returns are the sum of a linear process {ur } plus an uncorrelated, nonlinear process 
{oret}. The process {log(o;)} is Gaussian and AR(1), and {e+} is Gaussian white noise.) 


Component ut Zero AR(1) MA(1) 
Autocorrelations of returns at 
Lag 1: 0 0.019 0.05 
Lag 2: 0 0.018 0 
Lag 5: 0 0.015 0 
Lag 20: 0 0.007 0 


Test evaluated using: 


returns (R) or rescaled returns (RR) R RR R RR R RR 
Column 1 2 3 4 5 6 
Test 
First autocorrelation, 61 5.0 5.1 12 29 50 90 
Number of significant autocorrelations, Ny} 4.9 5.0 17 47 10 14 
Portmanteau, to lag 10, Q10 5.23. «332 22 6l 21 58 
Portmanteau, to lag 30, Q30 52». <3 21) 27 15 37 
Portmanteau, to lag 50, Q50 5.8 52 19 50 12 29 
Trend test, T 60 62 66 94 21 37 
Variance ratio, week/day, z5 51 51 25 62 31 65 
Variance ratio, month/day, z29 49 5.1 53 90 14 26 
Multi-period first autocorrelation, F(125) 4.0 4.2 19 27 6 6 
Multi-period regression, J (250) 33 3.7 23 34 5 6 
Spectral density at zero frequency, fo 6.0 6.5 65 91 12 18 
One-week cycle, fw 6.0 5.6 5 4 9 10 
Number of spectral peaks, Ns 3:3- Ad 8 15 10 27 
Runs test, K 5.2 22 73 
Rescaled range, (R/S)Man ST. 20 6 
Modified rescaled range, (R/S)Lo 3.6 7 4 
BDS test, 2-period histories, W2 100 11.2 100 44 100 68 
BDS test, 4-period histories, W4 100 168 100 48 100 5.8 
BDS test, 8-period histories, Wg 100 21.6 100 46 100 4.0 


The mean, standard deviation, and autoregressive parameter of the Gaussian, AR(1) pro- 
cess (log(o;)) are respectively a = —5.15, B = 0.422, and $ = 0.973. The size estimates in 
columns 1 and 2 are obtained from 40 000 simulations and the power estimates in columns 3-6 
are obtained from 10 000 simulations, except for the BDS figures obtained from 500 simula- 
tions. Each simulation provides a series of 2500 simulated returns. All tests are evaluated with 
asignificance level equal to 5%. Except for the BDS tests, all the test statistics have asymptotic 
size equal to the significance level for the processes simulated. The BDS results in columns 4 
and 6 are obtained using the critical values listed in Table 6.2. The following tests reject the null 
hypothesis in only one tail: N+, Q10, Q30, Q50, T, fo. fw. Ns. Rescaled returns are returns 
minus their sample mean divided by an estimate of their conditional standard deviation. 


statistics that are linear functions of autocorrelations with weights that are mono- 
tonically decreasing. The high power estimates for returns are 66% for T, 65% 
for fo, and 53% for zoo. The next highest figure is far lower at 25%. Note that 
rejection in two tails reduces the power of z29 by about 12%. 
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Test power is substantially increased by using rescaled returns and in some 
instances the increase is remarkable. The three high estimates improve to 94%, 
91%, and 90%. The next highest figure is 62% for zs compared with only 25% 
when returns are not rescaled. The considerable increases in test power are caused 
by two effects already discussed in Section 6.2. The first is a reduction in the 
standard errors of autocorrelations. The second is the creation of more dependence 
by rescaling (Taylor 1986, p. 177). A mathematical explanation of the second 
effect is given in Appendix 6.13. 

The BDS test is powerless according to the simulations reported here for 
rescaled returns. The power estimates are almost identical to the significance 
level when the appropriate quantiles are used. Higher rejection frequencies are 
obtained if the standard normal distribution is used instead, but even then the 
rejection frequencies are only a few per cent more than the numbers listed for the 
size calculations in the second column of Table 6.3. 


6.9.4 Power against an MA (1) Alternative 


The test having most power depends, of course, on the assumed alternative to ran- 
domness. Suppose now that the mean component {ur} is MA(1) and has moving- 
average parameter | and variance equal to 10% of the variance of returns. The 
simulated returns then follow an MA(1) process with first-lag population autocor- 
relation equal to 0.05. The volatility parameters, series length, and significance 
level are as before. 

The fifth and sixth columns of Table 6.3 show, as should be expected, that 
test power is higher when fewer autocorrelations are used. The runs test K is 
a type of first-lag test and outperforms 0, for returns to a remarkable extent; 
in general, K is more powerful than ô; for returns whenever there is sufficient 
variation in conditional variances. However, the runs test is second to 0, for 
rescaled returns. Runs test values cannot be increased by rescaling but the values 
of Ou generally increase and their standard errors generally decrease. The three 
highest power estimates for returns are 73%, 50%, and 31%, respectively for K, 
(1, and zs. The corresponding power estimates for rescaled data are 73%, 90%, 
and 65%. Once more, the rescaling transformation can substantially increase test 
power. 

The BDS test is again found to be powerless for rescaled returns. The low power 
from Monte Carlo results contrasts with the empirical power of 50% in Table 6.1 
for two of the tests applied to 20 real time series. There are at least two possible 
explanations. The BDS test may detect nonrandom behavior that is sufficiently 
different to that simulated. Alternatively, the quantiles used to test the real time 
series may be misleading because they are obtained by simulating a volatility 
process that is sufficiently different to those of real series. 


148 6. Further Tests of the Random Walk Hypothesis 


6.10 Sources of Minor Dependence in Returns 


Dependence among observed returns can be caused by time-varying expected 
returns, bid-ask spreads, price measurement errors, and rules that limit price 
changes. Any such dependence is small and less than that found in the series 
tested in this chapter. 


6.10.1 Time-Varying Expected Returns 


The null hypothesis of a random walk includes the assumption that returns have 
a stationary mean. Some variation in expected returns is compatible with market 
equilibrium. Finance theory does not require risk-free rates and risk premia to be 
constant through time and it would be surprising if they did not vary. 

Suppose now that the returns process is the sum of two independent stationary 
components, {u+} and (e;), with the second a zero-mean uncorrelated process. 
Then the autocorrelations of returns are 


Pr = COV (Hrt, r+r)/ Var(rj)), v >O, 


and hence bounds on the autocorrelations at all positive lags are given by 


[Pc] < var(ur)/ var(r;) = p*. (6.17) 


A few more assumptions provide an estimate for o*. The expected return in 
annual terms, equivalent to uz, is exp(250u; + 12507) — 1 when the uncorrelated 
component is Gaussian with variance o? and there are 250 trading days in a 
year. Over a few years the range for annual expected returns might be at most 
20%. The range for ur, say H to u”, is then constrained by 250(u" — u’) < 
log(1.2). A bound for the variance of ju; is given by a uniform distribution, namely 
(u" — u)?/12. A typical US stock has returns variance 0.016? (see Table 4.1). 
Thus one estimate of oi is 


p* = [1og(1.2)/250]? /[12(0.016)?] = 0.0002. 


Another way of producing a value for o" is to follow Merton (1980) and suppose 
returns are a constant risk-free rate ry plus a risk premium that is a function of 
volatility plus an uncorrelated residual; thus, 


re=rf t ajoj + oret. 


When the risk premium is 1046 per annum for the median level of volatility and 
log(o;) has a normal distribution with plausible parameters for stocks, o* is 0.0001 
when the risk premium equals 0.0250; and of is 0.0007 when the premium is 
1.60? (Taylor 1986). 

Further simulation results for the components model r; = 44; -Le are reported in 
Table 6.4 for five different ways of defining jz; and with e; either 1.1.d. or defined by 
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Table 6.4. Simulation estimates of the impact of expected return models on test statistics. 


A| AT 
Ai Rescaled AT Rescaled 

Model for jz; Volatility Returns returns Returns returns 
aot C 0.0004 (0.0006) — 0.0004 (0.0006) 0.07 (0.12) — 0.08 (0.12) 

V 0.0004 (0.0005) — 0.0007 (0.0007) 0.09 (0.11) — 0.13 (0.15) 
aor V 0.0004 (0.0007) — 0.0005 (0.0007) 0.07 (0.13) — 0.09 (0.12) 
ayo; V 0.0016 (0.0036) 0.0011 (0.0015) 0.28 (0.70) 0.21 (0.28) 
Day effects C —0.0031 (0.0028) —0.0029 (0.0027) 0.01 (0.09) 0.00 (0.09) 

V —0.0028 (0.0022) —0.0045 (0.0033) 0.07 (0.11) —0.03 (0.12) 
Month effects C 0.0009 (0.0009) — 0.0007 (0.0012) 0.10(0.15) — 0.10 (0.16) 

V 0.0005 (0.0009) — 0.0020 (0.0018) 0.04 (0.11) 0.25 (0.23) 


The models for expected returns are defined in Section 6.10. The unexpected component of returns 
has volatility that is either constant (C) or variable (V) with its logarithm AR(1) and Gaussian. A/ is 
the value of Af, for returns minus A54 for the uncorrelated component of returns. AT has a similar 
definition. The tabulated numbers are averages for samples of 40 series, with sample standard deviations 
in brackets. All autocorrelations are calculated from 2000 simulated returns. The results are taken from 
Taylor (1986). 


the stochastic volatility process described in Section 6.9. The tabulated numbers 
summarize changes in statistics caused by time-varying expectations. Thus, for 
example, Ap, is Q4 calculated from returns r; minus Ou calculated from the 
uncorrelated component e;. 

The specifications of ur are from Taylor (1986). First, results are given for a 
long trend in expected returns, us = aot, with ag chosen to make the expected 
annual return increase by 20% over eight years. Second, the risk premium is linear 
in volatility, with u, = 0.025o;. Third, the premium is the quadratic function 
Mt = 1.607. Fourth, day-of-the-week effects are specified by supposing that 
Monday’s expected return is 0.23% less than the expectations on all other days. 
Finally, a month effect is given by supposing u, on any day in January is 0.18% 
higher than jz; on any day in any other month. 

The simulation results show that the changes in test results caused by the 
assumed levels of variation in expected returns are very small. The calendar 
effects induce more dependence than can be attributed to a reasonable equilibrium 
model. None of the simulation results can explain the large observed increases 
in the variance ratio and T-statistics when rescaled returns replace returns in the 
tests. 

Random walk tests can be revised to test the null hypothesis that there is zero 
autocorrelation among excess returns, r; — Uu. Revised tests use an estimate 6 
of the upper bound o". For example, the revised test based upon the first-lag 
autocorrelation rejects the null hypothesis at the 596 level if 


Loi 


bm eae 
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The amount added to 1.96 is then 0.10 if there are 2500 returns, bı is 1, and of 
is 0.0020 (the highest first-lag, positive average in Table 6.4). Likewise, a revised 
T -test rejects the null hypothesis at the 526 level if 


T > 1.65 + 451p" vn 


and all the autocorrelation variances can be assumed to be at least 1/n. The amount 
added to 1.65 could then be as much as 0.45 for a series of 2500 returns when p* 
is 0.0020. 


6.10.2 Bid-Ask Spreads 


Suppose each reported price p; is either a bid or an ask price when a market 
closes. Let p* be the bid—ask average. Let ô; be the measure of the discrepancy 
between the reported and midpoint prices, defined by 


log(pi) = log(p;) + ôr. 
Also, let r; and r7? be returns series calculated from price series p; and pi. Then 
rt = r + ôt = ó; 1. 


It may be assumed that the 6; are uncorrelated and are independent of all terms 
in the series rž. Then 


cov(r;, ri 41) = cov (rž, rap — var(ó;). 


For stationary processes, the relationship between the theoretical autocorrelations 
of reported and midpoint returns, respectively p; and p*, depends on the ratio 
w = var(ó;)/ var(r;); thus, 


pi =Y + (1 — 24), p: = (1— 2y); fort 22. (6.18) 


The amount of spurious correlation is negative and essentially —y at the first lag. 
As the effect is negative, bid-ask spreads cannot explain the positive dependence 
documented here for several series. 

To illustrate the spurious correlation, consider exchange rate futures for which 
the spread is less than 0.05% of the bid—ask average price. Then, if the 6; average 
zero, |ó;| < 0.0005 implies var(6;) < 0.000252. From Table 4.1, var(r;) > 
0.007? and hence the spurious theoretical autocorrelation is between —0.0013 
and zero. 


6.10.3 Random Data Errors 


Likewise, random data errors reduce the first-lag autocorrelation. The term p? 
then refers to the correct price, ër is occasionally nonzero and then indicates an 
error and p; is the price used by the researcher. Estimation of the effect —w is 
difficult although it is obviously negative. 
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6.10.4 Limit Rules 


Some years ago, many futures contracts could not be traded at prices differing by 
more than a predetermined amount from the previous day's close or settlement 
price. There are now far less of these constraints. Some price series contain a 
high frequency of limited prices. Roll (1984) reported that orange juice futures 
contracts had limited prices on more than 10% of the days in his sample. 

Limit restrictions create spurious positive dependence among returns. When a 
market closes limit-up, subsequent prices must move higher on average to reflect 
the information that caused prices to rise on the limit day. The opposite pattern 
occurs for limit-down events. 

Simulations provide approximate estimates of the autocorrelation induced by 
limit rules. Let (p7] be a simulated random walk and, for convenience, let market- 
limited prices {p;} be defined using some limit parameter 0 by 


p; if (1--0)pi1 > pr > (1 — 6)pi-i, 
Pi = cO p-i if pp > 1+6)pr-1, (6.19) 
(1—60)pi-i if pf < (l —@)pr-1. 


This definition of {p,;} ignores intraday price movements that hit limits. The 
autocorrelation induced by the limit rule is a function of the ratio die. with o? 
the variance of the uncorrelated returns rž. 

Table 6.5 summarizes some simulation results when the process {r;"} is first 
Gaussian white noise and is second defined by the stochastic volatility process 
described in Section 6.9. The impact of limit rules is small and occurs almost 
entirely at the first lag. The limits create more spurious dependence among returns 
than among rescaled returns, when there are changes in conditional variances. 
Limited prices and hence extreme returns are more likely to occur in periods 
of high volatility when their impact on tests is likely to be reduced by dividing 
returns by volatility estimates. 

The number of limited prices in the eleven futures series studied here are 99 
for cattle contracts, 45 for corn, 16 for Treasury bonds, 14 for gold, 2 for the yen, 
and none in the other series. Only the corn series is both constrained by limits 
and has a significant test value for 01, for either returns or rescaled returns. These 
test values remain significant at the 5% level if they are revised downwards using 
the information in Table 6.5. None of the rejections of randomness for the futures 
series can be explained solely by limit rules. 


6.11 Concluding Remarks 


The results from a variety of tests show that many of the twenty time series tested 
in Chapters 5 and 6 contradict the RWH while some of the series provide little 
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Table 6.5. Simulation estimates of the impact of a simple limit rule on test statistics. 


Average values of: 
AO AT 
Per cent AO Rescaled AT Rescaled 
0/c  limmitdays Returns returns Returns returns 


Gaussian noise 


1.5 14.6 0.10 0.09 2.10 1.83 
2 4.9 0.04 0.03 0.66 0.56 
2.5 1.3 0.01 0.01 0.17 0.13 
3 0.3 0.00 0.00 0.03 0.03 
Stochastic volatility 
1.5 10.6 0.14 0.07 4.59 1.82 
2 5.2 0.09 0.03 2.59 0.86 
2:5 2.7 0.06 0.02 1.57 0.44 
3 1.5 0.04 0.01 0.99 0.24 


The limit rule is defined in Section 6.10.4. Limited prices can only change by a proportion 6, or 
less, from day to day. o is the standard deviation of returns when there are no limits. Unlimited 
returns are either Gaussian and i.i.d. or they are uncorrelated with a volatility process whose 
logarithm is AR(1) and Gaussian, with standard deviation 0.6 and AR parameter 0.985. A 1 is 
the value of 6, for limited returns minus 9 for unlimited returns. AT has a similar definition. 
All autocorrelations are calculated from 2000 simulated returns. The results are taken from 
Taylor (1986). 


evidence, if any, against the hypothesis. There is evidence against randomness for 
the spot stock indices and the futures on foreign exchange rates, but less evidence 
for the futures on indices. 

The dependence that is found in many series of daily returns is positive depend- 
ence. For these series, the random variables that generate returns on consecutive 
days are positively correlated and, in some cases, these variables are also posi- 
tively correlated beyond consecutive time periods. The dependence among daily 
returns is extremely small, as seen for example in Figures 5.5—5.7, and it appears 
to have decreased in recent years. Nevertheless it is more than can be explained 
by time-varying risk premia. 

Test power is relatively high for both the first autocorrelation test and the runs 
test evaluated on either equity series or simulations of MA(1) processes. Test 
power is also relatively high for linear functions of autocorrelations evaluated 
either on foreign exchange series or on simulations of ARMA(1, 1) processes 
that have positive dependence at all lags. Powerful linear functions have weights 
that decrease as the lag increases. Appropriate examples are defined by the T-test 
for trends, variance-ratio tests, and the estimated spectral density at zero fre- 
quency. 
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6.12 Appendix: the Correlation between Test Values for 
Two Correlated Series 


Suppose (7; ;} and {r2,,} are two uncorrelated, cross-correlated processes, whose 
linear interdependence is entirely contemporaneous, so that 


à ifs=t, 


COr(ri,s, 72,4) = f 
O otherwise. 


General results are presented for martingale differences, after results are first 
derived for the simpler situation when the vectors (r1 ;, r2.;)' follow a zero-mean 
1.1.d. process. The univariate processes are assumed to have unit variance without 
loss of generality. 

To obtain the asymptotic correlation between the sample autocorrelations (1,7 
and (2,7, defined here for positive lags t by 


n—t n 
^ 2 
Or Yonne f Yrs. 
t=1 t=1 


consider the variables 


n—-TA—-T n 


1 1 
2 
A, = » 5 p» F1 sl 1,s- c1 2,:02, tt and Bin = x Fig 


s—] t=l t=1 
so that 
Hi 02,1 = Au UI Baal, 


As n increases, B;,, — 1 and, applying the 1.i.d. assumption, 
2 
E[As] > Elriiri-ecr2a02,4cl = Elriaro;l1Elriaieer2;] = A^. 


Hence 
n E[f1,« 62,7] E x, n COLÉI (2,1) — i 
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and since n var((;,r) — 1, from the i.i.d. assumption, it follows that 
cor((1,r, 62,1) > ree also. 


Similar steps show that there is no asymptotic correlation between (1, and Oo: 
whenever t Æ E. It then follows that for any constants a, the linear combinations 
L; and La defined by L; = $a, fj. (summing over some range 1 < t <S k) 
also have correlation A”. 

The preceding results can be extended to a bivariate, stationary, martingale 
difference process, that has Elo: | r2, ri sri s4c] = 0 fort 2 s and t > 0, 
and likewise for the conditional expectation of rie, Assuming n var(ĝi r) > 
Bi,:, the general result also depends on the autocorrelations o. of the products 
Pt =11,112,r and the variance term k = var(rir2)/(var(ri) var(r2)): 


ae + Kfr,p 
(B1,7 B2,r)°> l 


Some estimates of A exceed 0.9, for example, for (rescaled) spot and futures returns 
from the same index and for some pairs of (rescaled) currency returns. High 
estimates imply that sample autocorrelations are then highly correlated across 
series when the random walk hypothesis is true for both series. 


cor(P1,7, (2,1) = 


6.13 Appendix: Autocorrelation Induced by Rescaling Returns 


Suppose returns r; are the sum of an autocorrelated component jz; and an uncor- 
related component ov ër, as in Section 6.9, with o; representing volatility. It is 
shown here that rescaling will increase measures of dependence when appro- 
priate assumptions are made. We compare the autocorrelations of the processes 
defined by 

Wi Hr 

Tt = quos; and y= — = — + er. 

Ot Ot 
Assuming the three processes {u+}, {0r}, and {€;} are stationary and stochastically 
independent of each other, that u; has mean zero, and that the e; are i.i.d. with 
mean zero and variance one, the autocorrelations are 


oe, = Ellr "e Elpitrrel Elo; Je 
Cfo oO eS Ty — Se . 
E[u2] + Elei : E[u2] Elo; 7] +1 
Let 
var (ur) 2 =) w 
A= ; =E E ; d C= —————. 
var (77) e E 1+ Ate — 1) 


Then the ratio of the autocorrelations simplifies to 


; Elo; to! 
E oe ud ws 
Pr,r Elo; ?] 
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The term o is at least one by Jensen's inequality (and as œ = 1 — cov(o?, Ce 
Since realistic values of A are almost zero, we anticipate that C exceeds one. 
Appropriate autocorrelations for the process fol) are almost one for daily vari- 
ables (Chapters 11 and 12) and so we may anticipate that oz, /prz,, > 1. 

To illustrate a lower bound for the ratio, suppose A = 0.05 and {log(o;)} is 
Gaussian with variance 8° = 0.25. Then œw = exp(48?) = e and C = 2.5. 


7 


Trading Rules and Market Efficiency 


Trend-following trading rules have the potential to exploit any positive autocor- 
relation in the stochastic process that generates returns. Four of these rules are 
evaluated in this chapter. There have been long periods in the past when trading 
rules would have provided valuable information about future prices. However, 
their value has not been demonstrated in recent years. 


7.1 Introduction 


Trading rules are numerical methods that use a time series of prices to decide 
the quantity of an asset owned by a market participant. This chapter considers 
the information that can be obtained from trading rules. Both the paradigm of 
efficient markets and the low levels of autocorrelation among daily returns can 
motivate prior beliefs that trading rules cannot achieve anything of practical value. 
Nevertheless, it will be seen that trading rules have sometimes been able to gen- 
erate profitable investment decisions, particularly for trading foreign exchange, 
that challenge the efficient market hypothesis. 

Four trading rules are evaluated in some depth in this chapter. The double 
moving-average rule and the channel rule are two elementary rules that are popular 
in the literature of technical analysts, while the filter rule is an elementary rule 
that has often been tried in academic research. The fourth rule is designed around 
ARMA(I, 1) forecasts of future returns. These rules are defined in Section 7.2. 
They all aspire to identify trends in recent prices that persist into the future. 

There are many other trading rules that are not analyzed here. Several further 
rules based upon technical analysis are defined rigorously and investigated by Lo, 
Mamaysky, and Wang (2000). They claim that some technical patterns provide 
useful information about the prices of US stocks. However, expected returns fol- 
lowing their patterns are similar to unconditional expectations (Jegadeesh 2000) 
and the same conclusion applies to UK firms (Dawson and Steeley 2003). Lukac 
and Brorsen (1990) find that many rules applied to commodity and financial 
futures prices make profits, but they do not conclude that these futures markets 
have been inefficient. 
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Research into trading rules has a long history, which initially focused on efforts 
to refute the notion of an efficient market. More recent research, pioneered by 
Brock, Lakonishok, and LeBaron (1992), instead uses trading rules to learn about 
the conditional means and variances of future returns. Their methods are described 
in Section 7.3 and are then evaluated for equity, currency, and other markets in 
Sections 7.4 and 7.5. Significant information about conditional means is uncov- 
ered by elementary rules, although its importance has diminished in recent years. 
Section 7.6 describes an Excel spreadsheet that implements the calculations of 
Brock et al. 

The weak form of the efficient market hypothesis is defined in Section 7.7 by 
a statement that says trading rules cannot outperform passive investment strate- 
gies, after considering trading costs and risk. Trading costs appear to have been 
sufficient to eliminate profit opportunities at major equity markets (Section 7.8), 
but not at currency markets (Section 7.9). Net trading profits at spot and futures 
foreign exchange markets during the 1970s and the 1980s can be explained in 
many ways, but the possibility of an inefficient market is shown to be credible in 
Section 7.10. Finally, the Monte Carlo results summarized in Section 7.11 show 
that very low levels of positive dependence among returns can be exploited by 
trend-following trading rules. 


7.2 Four Trading Rules 


A trading rule is a method for converting the history of prices into investment 
decisions. A typical decision variable at time t is the quantity q;4. of an asset that is 
owned from the time of price observation ¢ until the next observation at time t + 1. 
The quantity g;+ 1 is some function of the price history J; = (pi, Pr—1, Pr—2,-- - }- 
The time counter ¢ refers to trading days throughout this chapter, unless stated 
otherwise. 

Most rules restrict the quantities to a few possible levels. At time t we consider 
a maximum of three levels, whose values might depend on /;. We call day t + 1 
a Buy day if the quantity is the highest possible, a Sell day if it is the lowest 
possible, and a Neutral day if the quantity is some intermediate value. Typical 
values of the three quantities are 1, 0, and —1 for futures contracts and 2, 1, and 
0 for equities. 

Trading actions are determined by the classifications of days and vice versa. 
For example, consider two scenarios when day t is a Buy and day t + 1 is a Sell. 
First, a futures trader will liquidate a long position and initiate a short position 
at the close of day t, perhaps with q; = 1 and q;4,; = —1. Alternatively, if the 
asset is a stock that cannot be sold short, a trader will sell all he or she owns and 
invest the proceeds in a deposit account, again at the close of day t, so q; > 0 and 


G41 = 0. 
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The ambitions of traders have inevitably led to the creation of a multitude of 
trading rules. These rules have the objective of owning more of the asset when 
expected returns are higher. There is some predictability in the returns process 
whenever these expectations are fulfilled. The four trading rules investigated in 
this chapter are all designed to identify the current trend in prices, typically up 
or down. The rules will be profitable if prices generally move in the direction of 
the current trend for a sufficiently long period of time. The most attention will be 
given to the first of the four rules. 


7.2.1 The Moving-Average Rule 


A comparison of two moving averages defines one of the rules most frequently 
mentioned in market literature. Two averages of lengths S (a short period of 
time) and L (a longer period) are calculated at time t from the most recent price 
observations, including p;: 


S L 


1 1 
ats = 33S at L = Ärt (7.1) 
j=l j=l 
We consider the relative difference between the short- and long-term averages, 


measured by 
Re nb (7.2) 
ar. 
Brock et al. (1992) includes a statement that the most popular parameter combi- 
nations have S 5 and L > 50. 

When the short-term average is above [below] the long-term average, recent 
prices are higher [lower] than older prices and it may be imagined that prices are 
following an upward [downward] trend. When the two averages are similar, it 
may be argued that the information is not precise enough to form a view about 
the trend. Consequently, Brock et al. (1992) classify time period t + 1, which is 
the time from the close on day ¢ until the close on day t + 1, as 


Buy if R; > B, Neutalif C < R; < B, Sellif R; < —B. (7.3) 


This classification algorithm has three parameters: S, L, and B. The bandwidth 
B can be zero and then (almost) all days are either Buys or Sells. 

Figure 7.1 shows moving averages for the S&P 100 index from July to Decem- 
ber 2000 when A — 1 and L — 50, respectively marked by filled diamonds and 
empty circles. When B is zero, most days in July and August are Buys, and most 
days in October, November, and December are Sells. The two dotted curves are 
1% above and below the long-term average. A few index levels are between these 
curves and hence there are a few Neutral days when B — 0.01. 

The calculations for all the rules may be misleading if they are applied to price 
series that contain predictable jumps, particularly series that are constructed from 
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Figure 7.1. Moving averages for the S&P 100 index. 


several futures contracts. A satisfactory way to apply the rules to futures data 
replaces the price series {p;} by a wealth series {w;}, constructed from returns 
{r;} using the definition w; = w;_ exp(r;). 


7.2.2 The Channel Rule 


Lukac, Brorsen, and Irwin (1988) found that a channel rule performed best in their 
study of several technical trading systems. Their version of the channel trading 
rule assumes that a futures trader is always in the market. The description here 
adapts the rule so that neutral positions can be taken when a bandwidth parameter 
B is positive. 

By analogy with the moving-average rule, the short-term average is replaced 
by the most recent price (so S = 1) and the long-term average is replaced by 
either a minimum or a maximum of the L previous prices, respectively defined 
by 


m; = min(pi; cr, .... pi-2; Pr—-1). 


(7.4) 
Au = max(pi-L, .... pr-2, pr-1)- 


A person who believes prices have been following an upward [downward] trend 
may be willing to believe the trend has changed direction when the latest price is 
less [more] than all recent previous prices. 

The rule has two parameters: the channel length L and the bandwidth B. The 
algorithm to classify day ¢ + 1 depends on the classification of the previous day. 
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Figure 7.2. Channels for the S&P 100 index. 


If day t is a Buy, then day ¢ + 1 is 


Buy if p; > (1 + B)m, ij, Sellif p; < (1 — Bim: Neutral otherwise. 
(7.5) 
Likewise, if day t is a Sell, then symmetric principles classify day t + 1 as 


Sell if p; < (1 — B)M;-i, Buy if p; > (1 + B)M;-1, Neutral otherwise. 
(7.6) 
For a Neutral day t, day t + 1 is 


Buy if p, > 1+ B)M;_1, Sellif p; < (1 — Bim, Neutral otherwise. 
(7.7) 
Figure 7.2 shows values of pe, M;—1, and m;—, for the S&P 100 index when 
L = 40. When B is zero, all the days until late September are Buys, as p; > m; 4. 
The first Sell day is identified on 22 September when p; < m;— 1 and the remaining 
days are all Sells as then p, < M;_1. 


7.2.3 The Filter Rule 


Alexander (1961) invented a filter rule that has often been used in academic studies 
to separate days into two sets based upon a trader's market position. Sweeney 
(1986, 1988) used the filter rule to challenge market efficiency, respectively for 
foreign exchange and US firms. The rule is adapted here to allow for a band within 
which neutral positions are taken. 

An analogy with the moving-average rule can again be attempted: the short- 
term average is replaced by the most recent price and the long-term average is 
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replaced by some multiple of the maximum or minimum since the most recent 
trend is believed to have commenced. The terms m; and M; are now defined, for 
a positive filter size parameter f and a trend commencing at time s, by 


VUEN qd E f) max(ps, 05 Pt-2, Pr-1); 


: (7.8) 
A, = el c f) min(p;, sey Pr-2s Pt-1)- 


A person may now believe an upward [downward] trend has changed direction 
when the latest price has fallen [risen] by a fraction f from the highest [lowest] 
price during the upward [downward] trend. 

The parameters of the rule are the filter size f and the bandwidth B. Classifi- 
cation of days follows almost the same methods as for the channel rule. If day t 
is a Buy, then s + 1 is the earliest Buy day for which there are no intermediate 
Sell days and day t + 1 is classified using (7.5); it is possible that s + 1 = f. 
Likewise, if day t is a Sell, then s + 1 is the earliest Sell day for which there 
are no intermediate Buy days and day t + 1 is classified using (7.6). If day t is 
Neutral, then find the most recent non-Neutral day and use its value of s: if this 
non-Neutral day is a Buy, then apply (7.5) and otherwise apply (7.6). To start clas- 
sification, the first non-Neutral day is identified when either p > (1 + BIM: 
or p, < (1 — B)m;_, with s = 1. 


7.2.4 A Statistical Rule 


Trading rules based upon ARMA(1, 1) models for rescaled returns are investigated 
in several publications, commencing with Taylor (1983, 1986). The particular 
rule used in Taylor (1992) is discussed in this chapter and evaluated for futures 
contracts. It would have made profits for currency trades in the 1980s, to be 
discussed further in Sections 7.9 and 7.10. Note immediately, however, that these 
profits are slightly less than those from the simpler moving-average, channel, and 
filter rules. 

The ARMA(1, 1) process is defined by equation (3.23) and its autocorrelations 
are Af‘. The parameter A is positive for the price-trend model introduced by 
equation (3.42). From A and the autoregressive parameter d we can obtain the 
moving-average parameter 0 from equation (3.27). The statistical trading rule 
uses ARMA forecasting theory applied to rescaled returns defined by r;//h;, 
with the conditional standard deviation ./h; obtained from a special case of the 
simple ARCH specification introduced in Section 5.7. 

The rule relies on a standardized forecast Ke, given by the one-day-ahead 
forecast divided by an estimate of its standard error, 


Luz fé. (7.9) 
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This is evaluated using the equations 


fiam (h/h) P (9 + Ori — 8f a). 
ër = Sh, {Ab + 0)/( + G0)}'”, (7.10) 
V/hi41 = 0.9 / h, + 0.12531; 1. 


An upward [downward] trend is predicted when Eu is positive [negative]. 
A nonnegative threshold parameter k* determines the classification of days. If 
day t is a Buy, then day t + 1 is 


Buy if k; > 0, Sellifk, x —k*, Neutral otherwise. (7.11) 
Likewise, if day t is a Sell, then day t + 1 is 
Sellifk, <0, Buyifk, >k*, Neutral otherwise. 
The day after a Neutral day t is 


Buy if k; >k*, Sellifk, < —k*, Neutral otherwise. 


7.3 Measures of Return Predictability 


Trading rules provide information about future returns whenever the returns on 
Buy days have a different distribution to the returns on Sell days. We define two 
sets of time indices J and J as follows: 


t is in Z, denoted by t € I, if period t + 1 is classified as a Buy, 
t is in J, denoted by t € J, if period t + 1 is classified as a Sell. 


Then we may say that a trading rule applied to a stationary stochastic process 
representing prices is uninformative if the conditional densities f(r;i41 | t € I) 
and f(ri41 | t € J) are identical; the rule is informative if these conditional 
densities are different. 
A trading rule will be informative if expected returns depend on the Buy/Sell 
information, i.e. 
Elrailte Ds Ela |t e JI. (7.12) 


Given a time series of observed prices it is natural to assess the evidence about 
this inequality by using a difference between sample means. Let the number of 
time indices in the Buy and Sell sets be denoted by n; and n; respectively. The 
average returns for Buy and Sell days are 


- 1 z I 
F= T Xr and Fj = "n 3 ru (7.13) 


tel teJ 


and hence a measure of predictability is rr — FJ. 
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An obvious test of the null hypothesis that the rule is uninformative requires 
calculation of standard deviations s; and sj, respectively from the Buy returns 
and the Sell returns, followed by a comparison of 


s2  g2\05 
z= (7 - (t + 2) (7.14) 
ny nj 

with the standard normal distribution. Brock et al. (1992) obtain highly significant 
positive values of a similar statistic, which are reviewed in the next section. 

A second way to find evidence for predictability is to show that the probability of 
the price rising depends on the trading rule information. Buy and Sell probabilities 
can be estimated, after removing any zero returns, by 


pi = P(ri41 > 0|t €I andr z 0), (7.15) 


and likewise pj, and hence the difference p; — pr can be calculated. A bino- 
mial test is used by Brock et al. to show that their sample differences are highly 
significant. 

Brock et al. also show that something can be learnt about conditional second 
moments from trading rules. Evidence for such predictability is measured by 
the difference between standard deviations, s; — s;, which indicates whether 
or not volatility is higher when Sell classifications are made than when Buy 
classifications occur. 


7.3.1 Interpretation of the Test Statistic z 


It is instructive to clarify the most general null hypothesis about returns that can 
be tested by the methodology that leads to the z-test based upon equation (7.14). 
We now show that z can be used to test the null hypothesis that returns in excess of 
a stationary mean are generated by a stationary martingale difference. This null 
hypothesis is the definition of the random walk hypothesis given by definition 
RWH2 on p. 101, with the additional assumption of stationarity. Distributional 
assumptions are unimportant when z is evaluated for large samples, as a version 
of the central limit theorem will ensure that the distribution of z is approximately 
normal. 

Let i; be 1 if the information up to time ¢ is used to classify period t + 1 as 
a Buy, i.e. t € I, and otherwise let i; be 0. Likewise, j; is 1 if t € J and it is 0 
otherwise. Then the difference between Buy and Sell averages, for classifications 
of times t = 1,2,...,n,1s 

Le | X 
ESF] = SE una ny Zänn 


t=1 


= : isin ooh Lk (7.16) 
» E ls E Js 
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Now assume stationarity and let E[r;] = u Then the large sample properties of 
ri — rj; are unaltered if we replace 3  i,/n by E[i;]. Defining 


Fy = (ü/ Elis]) — Ge/ EU. 
we have E[F;] — 0 and 


iE] det 
E e TE 


0, otherwise. 


The large sample properties of ry — r; are thus the same as those of 
1 n 
d=- 2 Fi(rigi — B). (7.17) 
t= 


The expectation of d is zero when the excess returns r; — u are a martingale 
difference (MD), because then E[r;+1 — u | F;] = 0. The variance of d, multi- 
plied by n, simplifies to E [F2 (rix 1 — A^], by applying the MD and stationarity 
assumptions, and this expectation is consistently estimated by n(s? /ni+ $5 /n). 
It follows that the asymptotic distribution of z is N (0, 1) when the process gen- 
erating excess returns is a stationary martingale difference. Such processes are 
white noise and are hence uncorrelated. Many examples are given in Chapter 9 
when we define ARCH specifications that have constant conditional means. 


7.3.2 Power of the Test Statistic z 


A sample value of d calculated from a sample of returns is almost identical to 
cov(r;+41, Fy), when u, E[i;], and E[j;] are respectively replaced by 7, n;/n, 
and n; /n. Thus a sample value of d essentially equals the sample covariance of 
r;+1 With a nonlinear function of returns (up to time t), determined by the trading 
rule, which has only three possible values. The variance-ratio test of the random 
walk hypothesis, presented in Section 5.3, which compares single- and N-period 
variances, is essentially a test based upon the following covariance: 


N-1 
cov (r: Sow — Drac). 
t=1 
Consequently, the trading rule z-test is only likely to be more powerful than a 
variance-ratio test when ru correlates more highly with the nonlinear, trading 
rule function F; (r;, 7-1, ...) than with the linear function 3 (N — Elte, 
Acar (1993, 2002) has presented several related results for the single moving- 
average rule without a band (so S = 1 and B = 0). When expected returns 
are zero, the quantity IF is then approximately the sign (+1 or —1) of a linear 
function of returns. It seems probable in these circumstances that the z-test is less 
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powerful than an appropriate variance-ratio test. This prediction is supported by 
empirical test results for US and UK equity returns (Taylor 2000). Acar (1993) also 
shows that the single moving-average rule has optimal properties when returns 
follow a Gaussian ARMA(1, 1) process. 

The Monte Carlo study of test power in Section 6.9 shows that selected variance- 
ratio and autocorrelation tests of the random walk hypothesis have power above 
90% when applied to 2500 returns defined by an ARMA(1, 1) process whose 
autocorrelations are 0.02(0.95*). Further simulations show that the power of 
the trading rule z-test is also high when the trading rule parameters are chosen 
appropriately: 74% for the moving-average rule when S = 1, L = 30, and 
B = 0.01; 67% for the channel rule when L = 10 and B = 0; and 74% for the 
filter rule when f — 0.02 and B — 0.01, all for one-tail tests. 


7.4 Evidence about Equity Return Predictability 
7.4.1 US Equities 


Brock et al. (1992) use various technical rules to identify Buy and Sell days for 
the Dow Jones Industrial Average (DJIA) from 1897 to 1986. The DJIA does 
not include any dividend payments and this index has contained thirty stocks 
since 1928. Brock et al. document significant, positive values of the mean return 
and probability differences, r; — ry and p; — pj, and of the standard deviation 
differences, s; — sz. Thus the first and second moments of these returns have been 
predictable to some degree and hence technical rules have been informative. Four 
conclusions are particularly interesting, and are discussed here for the moving- 
average rule. Ten parameter combinations are evaluated by Brock et al., all of 
which have 1 < S < 5, 50 < L < 200, and B =O or 1%. 

First, “buy signals consistently generate higher returns than sell signals." The 
average returns on Buy and Sell days, across all parameter combinations, are 
respectively equivalent to 12% and —7% per annum (p.a.) (Brock et al., Table II). 
Tests on the differences between Buy and Sell average returns, using the z-statistic 
in equation (7.14), provide highly significant values of z for each of the ten param- 
eter combinations evaluated, ranging from 3.79 to 6.04. The null hypothesis is also 
rejected at the 5% significance level for each subperiod considered: 1897-1914, 
1915-1938, 1939-1962, and 1962-1986. The respective subperiod differences 
between annualized Buy and Sell returns are 18%, 27%, 11%, and 12%. 

Second, "returns following sell signals are negative, which is not easily ex- 
plained by any of the currently existing equilibrium models." All four subperiods 
have negative Sell averages, which cannot be explained by calendar anomalies 
because about 40% of all days are Sell days. 

Third, “returns following buy signals are less volatile than returns following 
sell signals." Standard deviations of 0.89% and 1.34% are reported, respectively 
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for Buy and Sell days (Brock et al., p. 1749). Consequently, if volatility measures 
risk, then changes in risk levels do not explain the higher average returns for Buy 
days than for Sell days. 

Fourth, “returns from these...strategies are not consistent with...popular mod- 
els." The most credible model assessed is the exponential ARCH model of Nelson 
(1991), incorporating MA(1) and conditional variance terms in the specification 
of the conditional mean, to be defined later in Sections 9.5 and 10.2. The bootstrap 
p-values for the difference between Buy and Sell average returns are all less than 
0.596 for this ARCH model. Although it cannot explain the difference between 
the average returns, the exponential ARCH model appears to explain some of the 
observed difference between Buy volatility and Sell volatility. 

Several further research studies have applied the rules and parameter values of 
Brock et al. Bessembinder and Chan (19952) reconsider the DJIA evidence. After 
including dividend yields in the calculation of returns, they find that negative 
expected returns on Sell days only occur before 1939. They claim that realistic 
transaction costs have always exceeded the amount required to eliminate gross 
trading profits. The predictability of the DJIA could then be a consequence of 
transactions costs, varying risk premia, bandwagon effects, and/or other explana- 
tions. The bandwagon concept is rejected by showing that there is as much useful 
information in CRSP indices, which are not followed by the market, as there is 
in the DJIA. Also, the moving-average rule can predict returns from individual 
equities but not their abnormal returns with respect to market models. 

Sullivan, Timmermann, and White (1999) discuss the problem of data-snoop- 
ing. This will occur if researchers copy rules from market literature that only 
promotes those technical rules that have the best historical returns. Sullivan et al. 
reanalyze the data of Brock et al., for more rules and for many more parameter 
combinations. They confirm that there is statistically significant evidence that 
trading rules provide information about the conditional mean of the DJIA, until 
1986. However, they find no evidence that the rules are informative during the 
subsequent decade until 1996, for either the DJIA or for futures on the S&P 500 
index. We obtain the same conclusion in Section 7.6 for the S&P 100 index, from 
199] to 2000. 

Day and Wang (2002) show that the evidence for predictability may arise 
because the prices of the component stocks in the index are not always syn- 
chronous. They find that the differences between Buy and Sell average returns 
from 1962 to 1986 are much less for (1) a value-weighted index constructed from 
the DJIA stocks, which gives less weight to smaller firms whose stocks may trade 
less often, and (ii) "true" levels of the DJIA given by modeling returns as an 
MA(1) process, which is consistent with nonsynchronous prices. Trading vol- 
umes were much higher from 1987 to 1996, when the average returns on Buy and 
Sell days are similar for the moving-average rule. 
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Fang and Xu (2003) compare DJIA average returns for the moving-average rule 
with a time series rule based upon forecasting returns using an AR(1) process, 
from 1896 to 1996. Their average difference between Buy and Sell returns for the 
time series rule is more than double the figure for the moving-average rule. This 
is an interesting result, although estimation of the AR(1) parameter from all the 
data may enhance the performance of the time series rule. 


7.4.3 Other Equity Markets 


Bessembinder and Chan (1995b) replicate the methods of Brock et al. for six Asian 
equity indices, from 1975 to 1989. The annualized difference between Buy and 
Sell returns for the moving-average rule averages 896 for Hong Kong, Japan, and 
Korea and a massive 52% for the emerging markets of Malaysia, Thailand, and 
Taiwan. Ito (1999) provides further evidence that indices for developed (Canada, 
Japan) and emerging markets (Indonesia, Mexico, Taiwan) have been predictable, 
for the period from 1980 to 1996. 

Hudson, Dempsey, and Keasey (1996) is possibly the only study that applies 
the methodology of Brock et al. to a very long series of index levels from another 
country. They investigate the UK FT-30 index from 1935 to 1994. All their average 
Buy returns are significantly higher than average Sell returns, at low significance 
levels. The overall averages are equivalent to annual rates of 1696 on Buy days 
and —6% on Sell days. Significant differences are reported for subperiods until 
1981 but not for the most recent subperiod from 1981 to 1994. Any profits from 
the trading rules are less than transaction costs. 

Taylor (2000) provides results for several UK series recorded from 1972 to 
1991: the Financial Times All-Share (FTA) index, calculated from the prices of 
more than 600 stocks, the prices of twelve large firms, and indices calculated 
from these prices. The moving-average rule produces values of the z-statistic 
that are significant at the 596 level for the FTA index, the 12-share indices, and 
four of the twelve firms. The most predictability is found in the FTA index. The 
values of the z-statistic correlate much more highly with linear combinations 
of autocorrelations, such as variance ratios, than they do with the first-lag auto- 
correlations. 

Our first conclusion about equity returns is that their conditional means have 
depended on trading rule information for long periods of time. Our second is that 
the degree of predictability has decreased substantially in recent years. 


7.5 Evidence about the Predictability of Currency and Other Returns 


We next consider the predictability of the twenty equity, currency, and commodity 
return series defined by Table 2.2 and tested for randomness in Chapters 5 and 6. 
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7.5.1 The Moving-Average Rule 


The issue of parameter optimization is deliberately avoided by Brock et al., who 
report the results for all the parameter combinations that they evaluated. We do 
the same and initially discuss results for all 24 combinations of short averaging 
periods S, long averaging periods L, and bandwidths B given by S = 1, 2, or 5, 
L — 10, 100, 150, or 200, and B — 0 or 146. For all combinations, classification 
of days as Buy, Sell, or Neutral commences at time 201. To avoid any impact 
from extreme outliers, the days in the crash week (commencing on 19 October 
1987) are excluded from the Buy and Sell sets when the summary statistics are 
calculated. 

The columns of Table 7.1 present the averages of quantities across all 24 param- 
eter combinations. These quantities are the numbers of Buy and Sell days n;, ny, 
the average returns rr, rj, the probabilities of price rises pz, pr, the differences 
in average returns Fz — F J, the standardized test statistics z, and the standard devi- 
ations s7, sj. Table 7.1 also shows how many of the combinations produce values 
of z below —1.96 or above 1.96 that reject the null hypothesis of a stationary 
martingale difference for excess returns at the 5% level. 

It is very clear from Table 7.1 and perhaps surprising that far more evidence is 
found for predictability in each of the four currency series than in any other series. 
For the yen, 15 of the 24 z-values exceed 1.96, with higher counts of 18 for the 
Deutsche mark and the Swiss franc and 19 for sterling. We should note, however, 
that many of the z-values for currencies are near 1.96 and that the 24 test values 
for a currency do not provide 24 independent test results. There are only seven 
values of z above 1.96 in total for the other sixteen series and there are no values 
below —1.96. The average probabilities for the currencies of price rises on Buy 
days are 5—6% higher than the statistics on Sell days. The differences p; — Di 
are much less for most of the other series. 

The differences s; — s; between the Sell and Buy standard deviations are 
generally higher for the equity series than for the other series. There is some 
evidence that volatility is higher following falling prices than following rising 
prices, but only for equities. The Buy standard deviations are less than the Sell 
standard deviations for all the stock indices and all three US stock firms. The 
difference is most pronounced for the Nikkei series, with averages of 0.92% and 
1.79% respectively for Buy and Sell days. Almost no difference is detected for 
the three UK firms, although Taylor (2000) finds s; < s; for all twelve UK firms 
analyzed for the longer period from 1972 to 1991. 

The values of the longer averaging period, L, used to produce Table 7.1 follow 
Brock et al. and are much higher than those used in some other studies, e.g. Taylor 
(1992). Repeating the calculations, with the only change being that L = 10, 20, 
30, or 40, produces 100 significant values of z at the 5% level (21% of the 480 
test values) compared with 77 significant values in Table 7.1 (16% of the test 
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values). For the lower values of L, there are ten or more significant z-statistics 
for the S&P 500 futures returns, the FTSE 100 spot returns and the yen and corn 
futures returns; the significant values for these series are negative for the S&P 
and positive for the other assets. 


7.5.0 The Channel and Filter Rules 


The initial channel rule results are for the eight combinations of channel lengths 
L and bandwidths B given by L = 50, 100, 150, or 200 and B = 0 or 1%, once 
more motivated by Brock et al. Classification commences with time period 202 
using prices up to time 201 and a neutral position at time 200. The proportion 
of significant test values z (696) is similar to the significance level (596). The 
currencies again have the majority of the significant results. A slight fall in the 
overall average mean difference, r; — F J, for the currencies from 0.068% for the 
moving-average rule to 0.05546 for the channel rule has a considerable impact on 
the proportion of significant currency mean differences. Shorter channel lengths 
L are recommended in Taylor (1994b), for example. Once more, repeating the 
calculations with L — 10, 20, 30, or 40 produces similar numbers, both for the 
number of significant results and for the currency mean differences. 

For the filter rule, appropriate filter sizes f have been selected by considering 
the standard deviations of daily returns and the sizes assessed by Sweeney (1986). 
Results have been obtained for all eight combinations of f and bandwidths B 
given by f = 2%, 4%, 6%, or 8% and B = 0 or 1%. Classification commences 
with time period 3, assuming a neutral position at time 1. Overall, 23% of the 
z-statistics are significant at the 596 level: 25 of them are above 1.96 and of these 
14 are for currency series, while 10 values of z for US equity series are below 
— 1.96. The overall currency mean difference, ry — r;,is now 0.053% and hence 
very similar to the average level for the channel rule that has fewer significant 
results. 


7.5.3 Comparisons with Random Walk Tests 


The various random walk tests applied to the same data in Chapter 6 reject the null 
hypothesis for 3296 of the tests when the significance level is 596. This rejection 
frequency exceeds that for all three trading rules. The correlations between the 
average values of z in Table 7.1 and the twenty-day variance-ratio test statistics in 
Table 5.2 are 0.69 for returns and 0.60 for rescaled returns, while the correlations 
are almost zero for two- and five-day variance ratios. Similar correlations are 
reported in Taylor (2000) for a different set of time series. 


7.5.4 Currencies 


There is clearly something noteworthy about the series for the four currencies, 
from 1982 to 1991, that is detected by a comparison of the average Buy return 
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with the average Sell return. The average difference r; — r; across currencies is 
always between 0.05% and 0.07% for the five specifications of rules and parameter 
combinations mentioned in this section, and this is enough to be of economic 
importance. Based upon Table 7.1, an average year contains 112 Buy days that 
provided a return of 4.9%, and 104 Sell days that provided a return of —2.5%. 
Thus a trader who was long (short) on Buy (Sell) days would have made more 
than 796 p.a. before transactions costs. Very similar results are documented in 
Kho (1996) for weekly returns from currency futures, from 1981 to 1991. 

As we will see in Section 7.10, the 1980s was a decade when currency specula- 
tion would have been particularly successful. During the later period from 1991 
to 2000, the average difference r; — F y for the spot DM/$ rate is between 0.02% 
and 0.05% for many values of the moving-average parameter L but none of the 
test values z are significant at the 5% level. 


7.6 An Example of Calculations for the Moving-Average Rule 


We now present a spreadsheet that shows how we can classify days for the moving- 
average rule and then test the null hypothesis that returns (in excess of their mean 
level) are a stationary martingale difference. The illustrative results are for the 
S&P 100-share index from January 1991 to December 2000. This section can be 
skipped by readers who are not interested in trading rule software. 

We first have to select the three parameters of the trading rule. A reasonable 
method maximizes the difference between average Buy returns and average Sell 
returns for a relevant series that predates the data to be tested. This motivates the 
choices S = 1, L = 50, and B = 1%, obtained by selecting the best values for the 
DJIA series from 1897 to 1968 for the 24 combinations evaluated in Section 7.5 
(Taylor 2000). 

Exhibit 7.1 shows Excel calculations, with the key Excel formulae listed in 
Table 7.2. The 2531 index levels p; are located in cells B3 to B2533. We work 
with percentage returns and henceforth drop the percentage adjective. The first 
return in cell C4 is given by 100*LN(B4/B3). The squared returns are also used 
in the calculations, with the first value placed in cell D4. Columns C and D are 
completed by copying and pasting the 1 x 2 rectangle C4:D4 as far as row 2533. 

The three parameters of the trading rule are in cells F38, G38, and H38. Each 
day is classified as belonging to one of three classes whose names are inserted into 
cells G40, H40, and I40. As S — 1, itis not necessary to calculate any short-term 
moving averages. 

The next step is to fill the seven cells E52, F52, and E53 to I53. The function 
AVERAGE provides the long-term moving-average values shown in E52 and E53, 
while the relative differences R; defined by equation (7.2) commence with F52 
and F53. Each day is classified by using the previous relative difference. Cell 
G53 shows that 14 March is classified as a Buy, since the short-term average 
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Predictability calculations for the S&P 100 index, 1991—2000. 


Exhibit 7.1. 
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Table 7.2. Formulae used in the spreadsheet for the moving-average rule. 


Cell Formula 


C4 ` =100*LN(B4/B3) 

D4 =C4*C4 

E52 -AVERAGE(B3:B52) 

F52 =100*(B52-E52)/E52 

G41 =COUNTIF($G$53:$G$2533,G40) 

G42  -SUMIF($G$53:$G$2533,G40,$C$53:$C$2533) 

G43  -SUMIF($G$53:$G$2533,G40,$D$53:$D$2533) 

G44  2G42/G41 

G45 =(G43-G41*G44*G44) /(G41-1) 

G46 | -SQRT(G45) 

G47  -$F$2*G44 

G48 | -COUNTIF(H53:H2533, "Yes") 

G49  -G41/G48 

G53  -IF(F522$H$38,$G$40,IF(F52«-$H$38,$H$40,$1$40)) 
H53  -IF(($G53-2$G$40) *AND($G52<>$G$40) , "Yes" , "No") 
I38  2(G44-H44) /SQRT((G45/G41) - (H45/H41)) 

I53 =IFC($G53=$H$40) *AND($G52<>$H$40) , "Yes" , "No") 
J38  -0.5*(G44-H44)/ ((1/G49) - (1/H49)) 


on 13 March is more than 1% above the long-term average. The classification 
formula uses the IF function twice to make comparisons between the percentage 
relative difference and the percentage bandwidth found in H38. Columns H and 
Ialso employ the IF function, to respectively determine when sequences of Buys 
and Sells commence that correspond to the commencement of trades, as will 
be explained in Section 7.8. Thus H53 and 153 respectively contain Yes and No 
because a Buy sequence commences in row 53. 

Columns E to I are completed by copying and pasting the 1 x 5 rectangle 
E53:153 as far as the final row of the spreadsheet. 

The returns for the Buy days are summarized in cells G41 to G49. The number of 
Buy days is obtained using the COUNTIF function, which here counts how many 
of the cells in G53:G2533 are the same as cell G40. The sum of the Buy returns is 
given by the SUMIF function, which here sums all the numbers in C53:C2533 for 
which the matched cell in column G is the same as cell G40. Likewise, we obtain 
the sum of the squared Buy returns, followed by the one-day mean, variance, and 
standard deviation, the annualized mean, the number of trades, and their average 
duration in trading periods. The Sell and Neutral summary statistics are given by 
copying and pasting the Buy formulae. Finally, we calculate the test statistic z of 
equation (7.14) and the breakeven transaction cost C* of equation (7.19). 

The results show that the annualized average Sell return of 3396 is much higher 
than the annualized average Buy return of 696, which is interesting when compared 
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with the low Sell returns of Brock et al. However, the z-statistic is — 1.47 so that 
the difference between the averages is not significant at the 1096 level. Another 
interesting result is that the standard deviation of the Buy returns (0.8496) is much 
lower than those of the Sell returns (1.43%). 


7.7] Efficient Markets: Methodological Issues 


Significant evidence of price predictability does not necessarily imply that a mar- 
ket is inefficient. Rather, to argue that a market is inefficient it is necessary to find 
a trading rule that has superior performance when it is compared with a passive 
benchmark. A suitable passive benchmark could be to buy and hold an asset or 
it could be risk-free investment. The risk-adjusted returns from these two passive 
strategies are both equal to the risk-free interest rate. Superior performance is then 
equivalent to a risk-adjusted return, net of all costs, that exceeds the risk-free rate. 
Like Jensen (1978), we define the weak form of the efficient market hypothesis 
by the following statement: 


No trading rule has an expected, risk-adjusted, net return greater than 
that provided by risk-free investment. 


Convincing evidence against the hypothesis requires superior performance that 
is both statistically and economically significant. 

Many issues arise for the above definition of market efficiency. Trading costs, 
resources, opportunities to diversify, and risk aversion vary considerably among 
individuals and among institutional investors. Hence performance from the same 
trading rule varies across traders and a market could be efficient for some and not 
for others. 

Measuring performance requires risk adjustments. These are straightforward 
when all investment is in domestic equities and only one risk factor is priced. 
It is far from clear how the adjustments should be performed, however, when 
investment opportunities are international and include currencies, commodities, 
bills, bonds, and real estate. 

Trading rules are usually evaluated after some form of optimization. It is well 
known that trading decisions have a favorable bias when parameters are selected 
after (ex post) finding the values that give the best results. Instead, parameters 
should be selected using only the information available before (ex ante) the out- 
comes from decisions are known. The amount of data used for optimizations is 
often subjective—using approximately one-third of the available prices is fairly 
common (e.g. Sweeney 1986), while three years of daily prices can be sufficient 
to obtain satisfactory parameter values for a simple rule (Taylor 1994b). 

Bias can appear in many other ways. The rule itself can be chosen ex post, 
possibly by unintentional “data snooping” (Sullivan et al. 1999). This possibility 
can be avoided by using past prices to define the structure of the trading rule, by 
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using the genetic algorithm methodology of Allen and Karjalainen (1999), Ready 
(2002), Neely, Weller, and Dittmar (1997), and Neely and Weller (2001). Lukac 
et al. (1988) and Lukac and Brorsen (1990) find that their best ex post rule is 
much better than the rule selected ex ante as time progresses. Also, Elton, Gruber, 
and Rentzler (1987) show that speculative funds possessing superior prospectus 
performance have indifferent subsequent performance. At present most academic 
researchers find superior performance more interesting than inferior results, so 
there is a potential selection bias in academic literature. This effect can be nullified 
by replicating published studies on later data, providing the results are published 
regardless of the outcome. 

Itisimportantto realize that all trading rule evaluations assume that it is possible 
to trade without altering the path taken by subsequent prices. This presumes first 
that there is a sufficient level of liquidity and second that any superior trader can 
avoid other traders copying a successful rule. 

Trading rules are often evaluated for the prices of futures contracts. The evalua- 
tion of trading performance is then complicated by the small amount of collateral 
needed to initiate trades. If the margin deposit is essentially zero, then the calcula- 
tion of returns and adjustments for risk are problematic. A practical resolution of 
these issues is given in Section 7.9. The possibility of a geared position in futures 
does not affect the efficiency of a market as gearing simply magnifies expected 
excess returns, risk adjustments, and the standard deviation of excess returns by 
the same amount. 

Two methodologies for assessing efficiency are now described. The method 
in Section 7.8 is particularly appropriate for stocks and stock indices. The asset 
traded then has substantial systematic risk, short selling may be difficult and 
the natural benchmark is buy and hold. The other method, to be described in 
Section 7.9, is appropriate for trading futures. Trading rules can then have minimal 
or no systematic risk, even when the underlying asset is a stock index, because 
short positions are often as common as long positions. The natural benchmark is 
then risk-free investment. 


7.8 Breakeven Costs for Trading Rules Applied to Equities 


Significant differences between the average returns on Buy and Sell days are only 
evidence against market efficiency if transaction costs are sufficiently low and 
special assumptions can be made about risk. A standard assumption, made here 
following Sweeney (1986), is that the risk premium for holding an asset is the 
same on Buy days as on Sell days. There is always a possibility that trading rules 
seek out a time-varying risk premium, so that Buy days have a higher average 
premium than Sell days. 
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7.8.1 A Breakeven Formula 


The breakeven cost for a trading rule is the level of transaction costs that ensures 
the average daily risk-adjusted profit, in excess of risk-free interest payments, 
equals the average daily expenditure on trading. Breakeven costs depend on the 
quantity of the asset held on Buy days, on any Neutral days, and on Sell days. 
Formulae for calculating average risk-adjusted returns are provided by Sweeney 
(1986, 1988), Day and Wang (2002), and Olson (2004) when the three quantities 
are integers, such as (1, 0, 0) and (1, 0, — 1). We may note that (1, 0, — 1) and 
(2, 1, 0) have identical risk-adjusted returns, because adding a fixed quantity to a 
portfolio will not change its risk-adjusted return. 

In general, we may decide that the benchmark strategy is “buy and hold" and 
that this strategy owns some quantity q; BH from the close on day f to the close on 
day t 4- 1. During the same time period suppose someone who follows a trading 
rule holds the quantity: 


(1+ Q1)q:BuH_ if period t + 1 is classified as Buy, 
qt = 4) 91,BH if period t + 1 is classified as Neutral, (7.18) 
(1+ Qj)qugH if period t + 1 is classified as Sell. 


The specific choices Or = n;/n,; and Q; = —1, with n; and nj the numbers 
of Buy and Sell days observed, have two advantages. First, the ex post risks of 
the benchmark and trading strategies are the same and, second, short selling is 
not necessary. The total risk-adjusted return for these choices, in excess of the 
risk-free rate, is simply 

Qmini(ri -rpi) + Qunj(r;j — rpg) - nj Ffr — (rj —rg) 
with rr and ry the average Buy and Sell returns, and rer and ry; denoting the 
average risk-free rates for Buy and Sell days. 

Now also assume that transaction costs are a proportion C of the price of the 
goods bought or sold. Also let D; and D; be the average durations of Buy and 
Sell trades, with a Buy (Sell) trade defined in the obvious way as a sequence 
of consecutive Buy (Sell) days. Then the reduction in the total return caused by 
transaction costs is 

Hr NJ 1 1 
2ce; * Clg x: 2cns( - + z) 

This reduction equals the total risk-adjusted return, in excess of the risk-free rate, 
when C equals the breakeven cost: 

œ= Cis err ey) 

2(1/D; + 1/Dy) 

This formula assumes capital is required to finance trades. This is not the case for 
futures and then the interest-rate terms should be removed. 


(7.19) 
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When C < C*, a trader could have outperformed the benchmark strategy. It 
is not possible to use a sample value of C* to perform a simple test of the null 
hypothesis of market efficiency; for our definition of efficiency, the autocorrela- 
tions of returns can all be positive and hence they can inflate the standard errors 
of terms like r; above s? /^1, which is only the appropriate value for the random 
walk hypothesis. 


7.8.2 Estimates 


Bessembinder and Chan (1995a, 1998) use the same rules and parameter com- 
binations as Brock et al. (1992) to estimate breakeven costs for the DJIA from 
1926 to 1991. Across all combinations, they state that the breakeven cost equals 
0.39% when a “double or out" strategy is followed, so that 


Q;=1 and Q;=—1; 


they assume this strategy has the same risk as “buy and hold.” The breakeven cost 
falls to 0.22% for their most recent subsample, from 1976 to 1991. They report 
bid—ask spreads of 0.12% (Knez and Ready 1996) and commission costs of 0.13% 
for institutional traders (Chan and Lakonishok 1993), for the later years of their 
sample, giving a total cost of 0.25%. Bessembinder and Chan (1998) conclude 
that there is little reason to view the differences between Buy and Sell average 
returns as indicative of market inefficiencies. They apply the same methodology 
to Asian equity indices in Bessembinder and Chan (1995b). 

Taylor (2000) uses optimized parameters that give the least breakeven cost 
for the DJIA from 1897 to 1968. For the moving-average rule this gives a zero 
bandwidth and the highest levels of the averaging periods considered: 5 and 
200 days. These choices minimize the number of transactions, in contrast to 
the choices that maximize measures of predictability and simultaneously also 
maximize transactions. The breakeven cost C* equals 1.07% for the DJIA, from 
1968 to 1988. This suggests profit opportunities for traders able to learn from the 
information available in 1968. However, they would have had to trade the thirty 
constituent stocks simultaneously at the prices used for index calculations, which 
is not very plausible (Day and Wang 2002). Taylor (2000) also finds the average 
of C* across twelve UK firms is a mere 0.08% for the optimized parameters and 
concludes that there is no evidence that the market for UK stocks was inefficient 
in the 1970s and 1980s. 

Most of the values of C* for the twenty series discussed in Section 7.5 are 
near zero when the moving-average rule is applied with the optimized parameters 
given above. The average breakeven cost equals 0.31% for the sixteen noncurrency 
series, but the four breakeven currency costs are more substantial and range from 
0.59% to 2.25%. 
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7.9 Trading Rule Performance for Futures Contracts 
7.9.1 A Measure of Trading Performance 


Trading rules have often been assessed by applying them to futures contracts. 
Trading costs are then often lower than for spot transactions and less capital is 
required to finance decisions. The amount of capital required is actually irrelevant 
when trader has sufficient resources to deposit Treasury bills as margin collateral. 
We assume a representative trader can do so and that the total cost of opening and 
later closing a futures position is a fixed proportion c of the initial price of the 
goods traded. With these assumptions, Taylor (1988, 1992) derives the following 
performance measure for N trades in some futures contract, with trade j begun at 
price p; and concluded at price D and with zz: equal to either 1 or — 1, respectively 
if the trader is long or short for trade j: 


N 


r=) fu (2) - e}. (1.20) 
j=l 


Pj 


This measure is positive whenever the average proportional price movement dur- 
ing trades (in the direction wanted by the trader) exceeds the proportional trans- 
action cost. 

The quantity R is interpreted as the return from a risky investment minus the 
return from a risk-free investment. To see why, suppose a trading rule produces 
trading positions (long, short, or neutral) from time 1, to time t2. The risk-free 
investment is the purchase at time tı, at a price B4, of a T-bill that matures at 
(or soon after) time t2, followed by its sale at time t2. The corresponding risky 
investment is identical except the T-bill is the margin deposit for all N trades 
and the number of contracts traded is B1/(Q pj) for trade j with Q the quantity 
of goods per contract. The risky investment will be called a risky bill. Then R 
is the return from a risky bill minus the return from the corresponding risk-free 
T-bill and thus R is an excess return. The definition of the excess return ignores 
minor cash flows, including interest paid on reinvested profits and losses, interest 
foregone on funds used to pay transaction costs and any cash flows from daily 
settlement of profits and losses. These neglected terms are not important (Taylor 
1988, 1992). 

The excess return measure R can be used to discuss the efficiency of a futures 
market if we suppose a representative trader manages a well-diversified portfolio 
that includes short-maturity bills and equities. This trader will buy and sell futures 
contracts if this improves the distribution of portfolio returns, for example, by 
reducing the portfolio variance without reducing the portfolio mean. The trader 
will include risky bills in the managed portfolio if their expected return exceeds 
the level offered by comparable risky assets. In particular, a futures market can 
be called inefficient if expected excess returns on risky bills are positive and the 
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systematic risk (“beta”) of risky bills with respect to a market portfolio is zero or 
negative. When these conditions apply, the trader can form a portfolio including 
risky bills that improves upon any benchmark portfolio that invests only in risk- 
free bills and the market portfolio. It may be possible for the T-bill collateral to be 
used to finance a geared futures position but this will merely multiply the excess 
return by the gearing factor, which has no impact on the efficiency of the futures 
market. 


7.9.2 Examples of Positive Excess Returns 


Positive average excess returns are obtained for the US Treasury bond futures 
market in Taylor (1988), but the averages are not significantly different from zero 
at the 5% level; the betas of the excess returns are essentially zero. 

The double moving-average, channel, filter, and statistical rules of Section 7.2 
are evaluated using ex ante parameter values for sterling, Deutsche mark, Swiss 
franc, and yen futures in Taylor (1992), for the highly profitable six-year period 
from December 1981 to November 1987. The futures trading positions can be 
obtained using the classification algorithms; for example, a long trade is initiated 
and concluded at the closing prices on days s and t respectively if days s + 1 to 
t are all Buy days, but both days s and t + 1 are not Buy days. 

The overall excess return, obtained by averaging across rules, currencies, and 
contracts, is 7.296 p.a. when c — 0.296. The rules average 7.8 trades p.a. so the 
gross excess return is 8.8% p.a. From the average excess return across rules and 
currencies, for each of twelve six-month trading periods, a t-ratio of 4.40 can be 
calculated that is highly significant. The same twelve numbers are regressed upon 
S&P 500 index returns minus T-bill returns to produce a beta estimate of 0.04 and 
a 95% confidence interval from —0.09 to 0.18. The beta estimate is very similar 
for a global equity index. It is concluded that the currency futures market appears 
to have been inefficient. 

For the sixteen combinations of rule and currency, all the average excess returns 
are positive and ten of their t-ratios are significant at the 5% level using a one-tail 
test. ANOVA tests do not find significant differences in average excess returns, 
either between currencies or between rules. It is, of course, wrong to select the 
trading rule parameters ex post. The average portfolio excess return when this is 
done equals 11.7% p.a. and hence ex post optimization adds 4.5% to the ex ante 
average of 7.2%. 


7.9.3 Further Results 


Average excess returns have also been calculated for the four rules applied to the 
eleven futures series whose predictability was considered in Section 7.5. However, 
the results do not provide any additional evidence against the efficiency of futures 
markets. 
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The selection of ranges used for parameter optimizations is subjective to some 
extent. The bandwidth B is set to zero. The short-term average of the moving- 
average rule is always the latest price (S — 1) and the long-term averaging period 
L is between 2 and 60. The channel rule length L is chosen from the same set. The 
percentage filter size f is selected from (0.5, 1, 1.5,..., 25}. The parameters of 
the statistical rule are given by jointly optimizing A from (0.01, 0.02, 0.03, 0.04}, 
$ from (0.91, 0.93, 0.95, 0.97, 0.99} and k* from (0.2, 0.4, 0.6, 0.8, 1}. Except 
for the filter rule, the parameter levels can be selected without considering the 
standard deviation of returns. As the sterling bill returns have much smaller stan- 
dard deviations, all the candidate filter sizes are then divided by five. 

Each futures series spans ten years. For each asset, the first three years are 
used to obtain optimized parameters for the first contract traded and, thereafter, 
contracts are traded across a seven-year period with parameters optimized ex ante 
separately for every contract using all the available ex ante price information. The 
optimized parameters maximize the average excess return, for some proportional 
cost c. As before, c is set to 0.2% except for bill futures when it is set to 0.04%. 
Most retail speculators in the 1980s would have had lower transaction costs (com- 
mission plus bid—ask spreads) than assumed here (Fink and Feduniak 1988) and 
institutional traders would pay less. 

The overall average excess return for the four currencies is 5.7% p.a. for the 
seven-year period that ends in November 1991. The average is very high for the 
first three years, which are part of the Taylor (1992) study, but it is only 1.3% p.a. 
for the final four years. The average figure for the other seven assets is 1% p.a. 
before transaction costs, but it is negative after costs are deducted. The annual 
averages for each of these seven assets are then between — 1.5% and 0.3%. 


7.10 The Efficiency of Currency Markets 
7.10.1 Trading Profits 


Simple trading rules applied to dollar exchange rates have generally found evi- 
dence for trading profits in the 1970s and the 1980s, but the evidence from the 
1990s is less encouraging for speculators. Dooley and Shafer (1983), Sweeney 
(1986), Taylor (1986, 1992), Levich and Thomas (1993), Kho (1996), Szakmary 
and Mathur (1997), and LeBaron (1999) all report currency trading profits for 
dollar rates. Furthermore, Okunev and White (2003) show that momentum strate- 
gies derived from moving averages of monthly rates, which buy strong currencies 
and sell weak currencies, would have been profitable from 1980 to 2000. 

The magnitude of trading profits for ungeared futures trades can be measured 
by the excess return measure given by equation (7.20). Some studies use spot rates 
and suppose the trader switches between holding domestic and foreign currency. 
Doubling the trading profits from long/neutral spot positions gives the same prof- 
its as from long/short futures positions, providing (a) spot results are adjusted 
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for differences between domestic and foreign interest rates and (b) there are no 
arbitrage opportunities between the spot and futures markets. 

We now consider the magnitude of annual excess returns for various time peri- 
ods. Initially, we consider five studies that cover the second half of the 1970s and 
the 1980s. These studies indicate that average excess returns were very approxi- 
mately 7% p.a. during this fifteen-year period, before transaction costs. 

Sweeney (1986) evaluates the filter rule for ten series of spot dollar exchange 
rates. From his Table V, an average annual profit of 2.5% is obtained from 1976 
to 1980, for filter sizes that give significant profits before 1976. Doubling 2.596 
gives an excess return of 5% above the dollar risk-free interest rate, which may 
reduce to 4% after transaction costs. The futures trades in Taylor (1986) give net 
excess returns of 7% from 1979 to 1981. The trades described in Section 7.9 
and Taylor (1992) have average net excess returns of 4% from 1982 to 1984 and 
1096 from 1984 to 1987. Levich and Thomas (1993) evaluate filter and moving- 
average rules for the daily futures prices of five currencies. They do not optimize 
parameters. Their average gross excess returns for the filter rule, across all filter 
sizes and all currencies, are 7% from 1976 to 1980, 7% from 1981 to 1985, and 
4% from 1986 to 1990. Kho (1996) obtains higher gross excess returns when 
the moving-average rule of Brock et al. (1992) is applied to weekly prices for 
futures on four currencies from 1981 to 1991. The average return is 10%, across 
all currencies and parameter combinations. 

The comprehensive analysis by Olson (2004) of daily spot rates for eighteen 
dollar exchange rates from 1971 to 2000 shows that currency trading profits 
have declined in the 1990s. He applies the double moving-average rule, without 
a bandwidth parameter, and reports results for moving-average parameters opti- 
mized over five-year periods. After deducting roundturn transaction costs of 0.196 
and then doubling his long/neutral spot excess returns, his Tables 4—8 provide the 
following out-of-sample (i.e. ex ante) net excess returns, firstly for all eighteen 
currencies and secondly for the four major currencies (sterling, Deutsche mark, 
Swiss franc, and yen) used in many prior studies: 


AU 18  Major4 
1976-1980 8% 17% 


1981-1985 2% 4% 
1986-1990 10% 15% 
1991-1995 096 2% 
1996-2000  —396 —3% 


The profits until 1990 are surprisingly high, when they are compared with previous 
studies, and they do not continue into the 1990s. Adjusting the excess returns 
for a constant risk premium, estimated from the buy-and-hold return for foreign 
currency in the same period, as in Sweeney (1986), gives the following risk- 
adjusted results: 
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AU 18  Major4 
1976-1980 7% 10% 


1981-1985 6% 1% 
1986-1990 2% 4% 
1991-1995 1% 1% 
1996-2000 0% 0% 


Olson (2004) concludes that the trading profits before 1991 may have been a 
temporary inefficiency that has now disappeared. It is therefore intriguing that 
Okunev and White (2003) report trading profits from their momentum strategy 
during the 1990s. 


7.10.2 Explanations 


The positive average excess returns from currency trading before the 1990s are 
significantly different from zero (Taylor 1992; Levich and Thomas 1993; Kho 
1996). They cannot be explained by a single, priced risk factor that is identified 
with an equity market portfolio (Taylor 1992; Okunev and White 2003; Olson 
2004). The most popular explanations of the excess returns refer to either a time- 
varying risk premium (TVRP) or to central bank intervention. 

The TVRP explanation can be motivated by international asset pricing models, 
but the magnitude of the premium in forward exchange rates required to explain 
trading profits is considerable. Taylor (1992) uses Monte Carlo methods to claim 
that trading rule profits can only be explained if the average reward for accepting 
the risky side of a one-month forward transaction is at least 2% of the forward 
price, which leads to an implausibly high reward-to-risk ratio. 

Bessembinder and Chan (1992) regress monthly returns from currency futures 
contracts on a constant and the lagged values of three instrumental variables: 
the equity dividend yield, the yield on three-month T-bills, and a “junk” bond 
yield premium. Significant dependence is found, from which monthly conditional 
expected returns are estimated. These are usually between —1% and 1% and are 
consistent with a TVRP, although market inefficiency is an alternative expla- 
nation. Conditional expectations within the same range are obtained by Bams, 
Walkowiak, and Wolff (2004), by applying the Kalman filter to one-month for- 
ward prediction errors. 

Kho (1996) estimates a bivariate model for futures returns and excess returns 
from a world equity index that allows both the futures “beta” and the price of 
risk to vary through time. Between one-third and one-half of the excess returns 
obtained from trading rules can then be explained by the TVRP for futures returns. 
As excess returns adjusted for the estimated TVRP are not significantly different 
from zero, the trading results do not provide a clear-cut conclusion about the 
efficiency of the currency futures market. 
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Central bank explanations are motivated by the possibility that these banks 
delay adjustments to fundamental factors by “leaning against the wind,” so that 
profits can be made by trading rules that seek out trends (Szakmary and Mathur 
1997). LeBaron (1999) demonstrates that a large proportion of the trading profits 
from a single moving-average rule were earned on the days that the Federal 
Reserve intervened in the currency markets. From 1979 to 1992, the daily average 
excess return from the trading rule is 0.033% for the mark/dollar rate. This average 
falls to 0.008% when the 12% of days on which the Fed intervened are removed. 
Likewise, the yen/dollar average falls from 0.040% to 0.017% when 6% of the 
days are removed. 

Saacke (2002) evaluates interventions by both the Fed and the Bundesbank. 
Neely (2002) also includes interventions by the monetary authorities in Switzer- 
land and Australia and he uses the same trading rule as LeBaron (1999), but 
from 1983 to 1998. The average annual excess return for the rule applied to the 
mark/dollar is 6.0% and it falls to 2.6% when Fed intervention days are removed. 
The Bundesbank intervened on more than twice as many days as the Fed; remov- 
ing these days reduces the average further, to 1.3%. By using several exchange 
rates per day, Neely concludes that high trading rule returns precede interventions. 
It appears that the monetary authorities intervene in response to short-term trends 
from which trading rules have recently profited. 

In conclusion, currency trading profits can be explained in many ways. The 
possibility that some of the profit opportunities available in the 1970s and the 
1980s were the result of an inefficient market cannot be dismissed. 


7.11 Theoretical Trading Profits for Autocorrelated Return Processes 


It may appear that the claimed excess returns for currency trades are contradicted 
by a lack of correlation between currency returns. From Chapters 4—6 we know 
that there is almost no correlation between the returns from exchange rate futures 
on different dates. However, there is some relevant evidence that challenges the 
random walk hypothesis: a variance-ratio test rejects the hypothesis for two of 
the four currency futures series at the 5% level (Table 5.2, final column) and 
the trend test statistic T rejects the hypothesis for three of these series, again at 
the 5% level (Section 6.8). There are also several rejections at the same level of 
the similar hypothesis that daily returns are generated by a stationary martingale 
difference plus a constant (Section 7.5). 

The results from these hypothesis tests, together with the evidence for profitable 
currency futures trades, imply that trading rules may be able to exploit low levels 
of linear dependence between returns. This is confirmed in Taylor (1994b), by 
applying the channel trading rule to the prices of several thousand simulated 
futures contracts obtained from an ARMA(1, 1) process. The channel rule is 
preferred as it consistently outperformed the double moving-average and filter 
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rules in a small simulation study contained in Taylor (1992), thereby supporting 
its prior endorsement by Lukac et al. (1988). We now define the simulated process 
and then summarize some of the simulated trading outcomes. 


7.11.1 Monte Carlo Model 


As in Section 6.9, a correlated returns process {r;} is simulated by supposing that 
ry = Ur + jet. (7.21) 


The trend component {u+} is a Gaussian AR(1) process that has mean zero, vari- 
ance Aw”, and autocorrelations $* , while the residual component Io, e) is uncor- 
related and has variance (1 — A)o2. The stochastic volatility process (o;), to be 
discussed in Chapter 11, is defined by supposing (log(o;)) is Gaussian and AR(1), 
with mean o, standard deviation f, and autocorrelations ®*. The variables e; are 
iid. and normal with mean zero and variance 1 — A. The three processes {ur}, 
{o+}, and {e+} are independent of each other. 

As explained in Sections 3.5 and 3.6, the simulated returns process is an 
ARMA (1, 1) process with autocorrelations Aa! The parameter values A = 0.02 
and @ = 0.95 are used in Taylor (1992, 1994b) because they have been compatible 
with the autocorrelations of daily and monthly currency returns. The correlation 
between consecutive returns is then a mere 0.019. More important, however, is 
that the sum of the autocorrelations over all positive lags is 0.38. It is this high sum 
which explains the high level of the simulated trading profits. The remaining sim- 
ulation parameters are compatible with the average level and the autocorrelations 
of squared daily currency returns: o = —5.15, B = 0.422, and o = 0.973. 


7.11.2 Monte Carlo Results 


We consider the expected value of the excess return from trading, defined by 
equation (7.20), for the channel rule described in Section 7.2 without a bandwidth 
parameter. The maximum expected value of 7.6% p.a. occurs for the above model 
parameters when the channel length parameter L is 14 days and there are no 
transaction costs. 

Higher values of L are optimal when trading is costly. When the proportional 
trading cost c equals 0.2%, the optimal L is 23 days and then the maximum 
expected net excess return is 5.6% p.a. More realistic results are obtained by 
supposing L is chosen ex ante after learning from a few years of trading results. 
Selecting the best L using a learning period of one, two, or four years gives 
expected net excess returns respectively equal to 4.0%, 4.4%, and 4.696 p.a. 

The expected net excess returns from the simulations when A = 0.02 and $ = 
0.95 are slightly less than the typical 7% p.a. for historical series of exchange rates 
discussed in Section 7.10. The high historical level may be fortuitous, although 
it can be matched by slightly increasing either A or d. 
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Itis concluded that a simple trading rule can exploit very low levels of autocor- 
relation when all the theoretical autocorrelations are positive. From simulations 
for various values of A and 6, the expected gross excess return is approximately 
a positive constant multiplied by —Alog(1 — @). The multiplicative constant 
depends primarily upon the standard deviation of the returns process. 

The success of the channel trading rule applied to the simulated ARMA(I, 1) 
process must imply that the channel rule often correctly identifies the direction of 
the current trend, represented by the process {ur}. Suppose qr is either 1 or —1, 
respectively if the trading rule is long or short during period f, and let s; be either 1 
or — 1, respectively if u, is positive or negative. The proportion of days for which 
qt = s; is denoted by P. It is a measure of directional forecasting accuracy, which 
equals 50% for rules applied to random walks. The proportion P is between 61% 
and 63% for many values of L and many learning periods when A = 0.02 and 
ġ = 0.95. This level of accuracy is particularly impressive when it is compared 
with the 64% that can be attained by using the optimal ARMA(1, 1) forecast of 
the trend component. 


7.12 Concluding Remarks 


There is plenty of evidence that trading rules have been able to reveal information 
about the conditional distributions of future returns. Furthermore, some of this 
information has been sufficiently precise to allow trading rules to make profits 
that are several per cent per annum, after deducting transaction costs and adjusting 
for risk. 

The successful applications of trading rules are, however, generally restricted 
to prices recorded before the 1990s. Higher trading volumes, the arrival of cheap 
and almost instantaneous communication of information around the world, and 
the inexorable tendency for competition to eliminate price imperfections have 
probably all contributed to higher levels of market efficiency in recent years. 
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An Introduction to Volatility 


Asset price volatility is central to the following seven chapters, in which we cover 
volatility measurement and modeling, continuous-time processes, option pricing 
formulae, and volatility forecasting. This short introductory chapter commences 
with definitions of volatility and continues with a general discussion of explana- 
tions for volatility changes. It is then shown that volatility changes explain the 
major stylized facts for time series of asset returns. 


8.1 Definitions of Volatility 


Volatility is a measure of price variability over some period of time. It typically 
describes the standard deviation of returns, in a particular context that depends 
on the definition used. Alternatively, we can say that volatility is the standard 
deviation of the change in the logarithm of a price or a price index during a stated 
period of time. Volatility can be defined and interpreted in five different ways. 
Various interpretations appear in phrases such as: 


(i) The volatility of Microsoft stock is 30% per annum. 


(ii) The annualized volatility of the sterling/dollar exchange rate was 12% in 
2004. 


(iii) The volatility of tomorrow’s price is 1%, given our observations of recent 
prices. 


(iv) The volatility of prices follows a mean-reverting stochastic process. 


(v) The market volatility for the FTSE 100 index during the next three months 
is 15% per annum. 


In the first phrase volatility is simply a parameter, which is invariably denoted 
by o. The standard deviation of the continuously compounded return during any 
T -year period is then o v/T, whatever the prior history of the asset's price. This 
definition is applicable whenever prices are assumed to follow geometric Brown- 
ian motion (GBM), defined in Chapter 13. The variance of a change in the price’s 
logarithm, log(p(72)) — log(p(T1)), is then proportional to the time difference, 
T» — T; years, and equals o? (Tə — Tı). The assumption of GBM is unrealistic 
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when describing market prices because future price variability does depend on 
the recent history of prices. Nevertheless, it can be a useful approximation to 
reality for other purposes, particularly when defining a parameter that permits the 
determination of rational option prices. Statement (1) could then describe one of 
the parameters used to value Microsoft options. 

Realized volatility, also called historical volatility, is the standard deviation of 
a set of previous returns. For n trading periods, and returns 7;_,, ..., rj; .,, whose 
average is r, the historical standard deviation 
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provides a simple estimate of the standard deviation of the return for trading 
period £. This volatility measure can be stated in annual units as s/N, with 
N denoting the number of trading periods in one year. Statement (ii) may then 
follow from a hypothetical calculation that finds the set of 252 daily returns in 
that year have standard deviation 0.007 56 so that s JN — 0.12. Recent research 
into realized volatility has considered trading periods measured in minutes, with 
n chosen to provide a one-day history of intraday returns. Some properties of this 
high-frequency volatility measure are described in Chapter 12. 

Conditional volatility 1s the standard deviation of a future return that is condi- 
tional on known information such as the history of previous returns. Unlike real- 
ized volatility, the expectation for the next period is calculated using a time-series 
model that has been selected and estimated using appropriate data. Convenient 
and accurate equations for volatility expectations are provided by ARCH mod- 
els, described in detail in Chapters 9 and 10. These autoregressive, conditional 
heteroskedastic models specify the conditional variance h; of the return in period 
t, using prior information I;—1. A popular example is a weighted sum of squared 
excess returns, defined by the recursive equation 


hy =@+a(rj-1 — W? + Bhi-1, 


with the parameters a, 6, u, and w estimated from a long time series of returns. 
Statement (iii) might be made if we had estimated these parameters using daily 
returns until 23 November 2005 (day t — 1) and had then found ^; = (0.01)? for 
the return from the 23rd to the 24th (day t). 

Stochastic volatility processes are motivated by noting that volatility is not 
constant and hence it is interesting to seek to specify how volatility changes 
through time. Typical discrete-time models suppose that volatility is unobservable 
and then its stochastic properties may be inferred from either absolute or squared 
returns. A first-order autoregressive process provides a parsimonious description 
of the logarithm of volatility, which is evaluated in Chapter 11. Continuous-time 
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models are used to price options when the assumption of constant volatility is 
relaxed. A square-root process for volatility, discussed in Chapter 13, permits the 
rapid calculation of appropriate option prices. Statement (iv) indicates that high 
volatility levels are expected to be followed by lower levels; conversely, when the 
current level of volatility is low it is expected that volatility will increase. 

Implied volatility is a value calculated from an option price. It equals the volatil- 
ity parameter o for which an option's market price equals its theoretical price 
according to a pricing formula. The Black-Scholes pricing formula provides 
theoretical prices for European call options, say c(o), and assumes that the asset 
price process is GBM with annual variance rate o°. As c(c) is an increasing 
function of c, for any market price cy between the lower and upper bounds that 
exclude arbitrage profits there is a unique solution to the equation 


CM = c(o) 


that defines the implied volatility. These volatility measures depend on the time 
until expiry and the exercise price of the option, as will be seen in Chapter 14. 
Thus statement (v) might be made if the implied volatility of an European at-the- 
money option on 17 September 2004 that expires on 17 December 2004 equals 
0.15. Option markets are competitive and prices must incorporate the market’s 
expectations about future volatility. It is therefore reasonable to conjecture that 
implied volatilities are the best source of information when forecasting volatility. 
This hypothesis is investigated in Chapter 15. 

At any time the values of realized volatility, conditional volatility, unobservable 
stochastic volatility, and implied volatility will usually all be different, because 
different data and assumptions are employed when these values are calculated. 
To illustrate some differences, Figure 8.1 shows a year of annualized volatility 
numbers for the S&P 500 index, calculated once every day from June 2003 until 
May 2004. The three curves show annualized, percentage standard deviations for 
realized volatility (dark, continuous curve) defined by the 100-day standard devi- 
ation, conditional volatility (light, continuous) estimated from the GJR-GARCH 
model, to be introduced in Section 9.7, and implied volatility (dots) defined by a 
new version of the VIX index, which is calculated from S&P 500 option prices. 


8.2 Explanations of Changes in Volatility 


The volatility of asset prices is not the same at all times. Volatility clustering is 
seen in periods of high and low volatility when returns are plotted in time order 
(Chapter 2). Furthermore, the stylized fact that squared returns are positively 
autocorrelated (Chapter 4) is indicative of positive autocorrelation in the volatility 
process. In later chapters we will see that the parameters of ARCH and stochastic 
volatility models reject the hypothesis of constant volatility. We will also see 
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Figure 8.1. A year of annualized volatility numbers. 


that traders do not believe volatility is constant, because implied volatilities vary 
considerably through time. 

So why does volatility change? There is no complete and satisfactory answer 
to this fundamental question. There are partial answers that explain some of the 
variation in volatility, but much variation remains unexplained. 

Stock market volatility increases during crises and then decreases in due course. 
The economic crisis known as the Great Depression was accompanied by fears 
of social unrest (Voth 2003) and very high volatility from 1929 to 1934, attaining 
levels that have rarely been observed in later years (Officer 1973; Schwert 19902). 
The political crisis triggered by the Watergate tapes in March 1973 created uncer- 
tainty about the US administration that was eventually resolved by the resignation 
of President Nixon in August 1974. Volatility increased substantially when the 
existence of the tapes became public knowledge (Hsu 1982). The stock market 
crash of October 1987 was a financial crisis that was followed by a short period 
of extraordinary high volatility (Schwert 1990a,b). There are no satisfactory eco- 
nomic explanations for this crisis, although we may note that volatility was high 
immediately before the crash. Volatility was high before the terrorist attacks on 
11 September 2001 and went much higher when the US markets reopened six 
days later. 

Macroeconomic variables, such as inflation, employment, and GNP, have an 
impact upon the volatility of stock, foreign exchange and interest-rate markets. 
Scheduled macroeconomic news releases in the US coincide with the commence- 
ment of a few minutes of much higher volatility, both in the US (Ederington and 
Lee 1993; Fleming and Remolona 1999) and in the UK (Areal and Taylor 2002). 
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Inflation, money growth, and industrial production explain a small proportion of 
the variability of stock market volatility over longer periods of time (Schwert 
1989), so that most of the variation in volatility remains unexplained. 

Stock volatility is dependent, to some degree, on the level of the market. When 
prices fall, the value of firm equity relative to debt decreases and hence financial 
leverage increases. At the same time volatility increases on average (Black 1976b; 
Christie 1982). This phenomenon is discussed in Sections 9.7 and 10.2. It may, 
however, be a volatility effect that is not attributable to changes in debt/equity 
ratios (Duffee 1995). 

Theoretical asset pricing models can explain some variation in volatility, for 
example, models that assume investors have asymmetric information (Brock and 
LeBaron 1996; Timmermann 2001). Several theoretical models are mentioned by 
Johnson (2001), who proposes a novel theory that rational agents must infer the 
degree of persistence of fundamental shocks. More empirical research is needed 
to decide if theoretical models can explain the magnitude of day-to-day changes 
in volatility. 

Volatility is positively correlated with trading volume (Karpoff 1987; Gallant, 
Rossi, and Tauchen 1992). This does not imply that changes in volume cause 
changes in volatility, or vice versa. Indeed, it can be argued that itis more plausible 
that there is no causal relationship between volatility and volume; whatever factors 
determine volatility may, simultaneously, also determine volume. 

Trading decisions and hence volatility must, at least in part, be determined by 
the information that reaches the market. A simple economic model is presented 
below that shows volatility increases as the amount of information increases. The 
market efficiently interprets information in this model. 

Empirical analysis of the relationship between volatility and information is 
difficult because we can only identify some of the relevant information. US stock 
market volatility has a weak relationship with daily counts of headlines reported 
by Dow Jones in the Broadtape and the Wall Street Journal (Mitchell and Mulherin 
1994) and an insignificant dependence on the number of news releases by Reuters 
on its North American Securities News wire (Berry and Howe 1994). The total 
number of news headlines provided by Reuters only has a minor impact upon 
foreign exchange volatility; however, counts of appropriate headlines selected 
using economic keywords have a more discernible impact (Melvin and Yin 2000; 
Chang and Taylor 2003). 


8.3 Volatility and Information Arrivals 


The following economic model shows that volatility can be related to a stochastic 
number of intraday price revisions. The origins of this model are in Clark (1973) 
and Epps and Epps (1976), followed by a rigorous economic analysis of infor- 
mation arrivals, volatility, and volume in Tauchen and Pitts (1983). The market is 
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assumed to be efficient with daily returns r; having expected value u. The number 
of news items on day t is represented by a random variable, denoted by N;. When 
news item i reaches the market, the price logarithm changes by cr: and these 
changes are assumed to have zero mean. Then 


AN 
ET EK (8.1) 


i=l 


Now suppose further that the e: are i.i.d., normal random variables, which are 
independent of N;. Let 0? be the variance of £i t. Then the distribution of the 
return conditional on n; news items is normal with variance 


var(r | N; = nj) = ni^. (8.2) 
It is then appropriate to define the stochastic volatility process by 
of = Më (8.3) 


and then 


fo = U + Ort (8.4) 


with u; a standard normal random variable that is independent of the random 
variable o;. 

From equation (8.2) it can be seen that volatility changes when the amount 
of relevant news changes. Also, volatility clustering will then occur if there is 
sufficient positive autocorrelation in the process of news counts to ensure that 
there are some periods of several consecutive days that have high news counts and 
others that have low counts. With further assumptions, expected trading volume 
is proportional to the number of news items and hence volatility and volume are 
positively correlated variables (Tauchen and Pitts 1983). 

The above information arrivals model does show why volatility can change, 
even if some of the assumptions can be criticized; for example, prices may change 
even when there is no new information. Further methods and results are pro- 
vided in several papers. Harris (1987) considers empirical transaction counts, 
Lamoureux and Lastrapes (1990) include volume in an ARCH model, Gallant, 
Hsieh, and Tauchen (1991) consider the implications of temporal dependence in 
daily news counts, and Andersen (1996) utilizes a microstructure framework that 
includes noise traders. Blair, Poon, and Taylor (2001a) propose and investigate 
multivariate extensions of the information arrival model. Luu and Martens (2003) 
show that intraday returns provide new insights into the relationship between 
volatility and volume, which support information arrival models. 
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8.4 Volatility and the Stylized Facts for Returns 


The three major stylized facts for daily returns presented in Chapter 4 can all 
be explained by assuming that volatility follows a stochastic process that has the 
property that today's volatility is positively correlated with the volatility on any 
future day. We suppose in this section that daily returns can be described by the 
equation 

Tt = HU + Ott (8.5) 


with six assumptions, which are: 


(i) the expected return 44 is a constant; 
(ii) o; is a positive random variable, that has more than one possible realized 
value; 
(iii) the stochastic process {ø+} is stationary, E [o7 ] is finite and all the autocor- 
relations of (o2) are positive; 
(iv) u; is a standard normal random variable, so u; ~ N (0, 1); 
(v) the u; are i.i.d. variables; 
(vi) the processes (o;) and {u;} are stochastically independent, i.e. the vector 


variables (01, 02, ..., On) and (u1, U2, ..., Un) are independent for all pos- 
itive integers n. 


Equation (8.5) is identical to (8.4), but it is not necessary to make any of the 
intraday assumptions used to derive (8.4). 

The first stylized fact is that the distribution of returns is not normal. From (8.5) 
the distribution of returns is a mixture of normal distributions, with the mixture 
determined by the distribution of volatility. This mixture distribution has higher 
kurtosis than that of a normal distribution, since 


var(r;) = E[(r; — w)?] = Elo?u;] = Elo? ]Elu;] = Elo;]. 
E[r, — Wt] = Elo?u;] = Elo; ]Elu;] = 3Elo; ]. 
3E[o7] var(o7) 
F212 3 2 
Elo; Sechs 
The second stylized fact is that returns are almost uncorrelated. The autocorrela- 
tions are zero at all positive lags t when the assumptions apply, because 


(8.6) 


kurtosis(r;) — 


COV(T?, Ft+r) = COV(OtUt, 01 crc) 
= Elo;uror4r rte] — ElorurlE [Otru] 
= Elo;or+7 | E[ur] Eure] — Elor]Elor+r]Elur]E [urr] 
=0-0 
= 0. (8.7) 
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The third stylized fact is that both absolute returns and squared returns are posi- 
tively autocorrelated. Let s; = (r; — w)?. Then, for all positive lags T, 


Dede E 
COV(S;, Biel = COV(O; ur, OF; Lupus) 


= E[o2u202, .u2,.] — Elo ur Elon. uy] 
= Elodie Elu 1 E[u7,..] — Elo? JElo7, JE lus E uz, «] 


2522 
= cov (0f , o5.) 


> 0. (8.8) 


Consequently, positive dependence in the volatility process implies positive de- 
pendence in squared excess returns. Likewise, it can be shown that there is also 
positive dependence in absolute excess returns, a; = |r; — ul. 

The six assumptions that follow equation (8.5) suffice to provide a framework 
within which volatility changes explain the major stylized facts for returns. This 
framework is developed further in Chapter 11, by considering specific stochastic 
processes for {o;}. Some of the assumptions can be relaxed, as indeed they are 
in Chapter 9 for ARCH models, without altering the conclusion that volatility 
changes explain the stylized facts. 


8.5 Concluding Remarks 


The sources of volatility changes are elusive. Time-series models for volatility 
can be estimated from asset returns without knowing why volatility changes, and 
this is the path that we take in the next four chapters. Later we consider volatility 
forecasting, and then take into account the additional volatility information that 
is revealed by option prices. 
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ARCH Models: Definitions and Examples 


Examples of models for the conditional variances of returns are described and 
estimated in this chapter. These models are easy to estimate from a time series 
of returns and provide insights into the movement of volatility through time. The 
models belong within a general class of ARCH models that is also defined. 


9.1 Introduction 


ARCH stands for autoregressive conditional heteroskedasticity. Changes in the 
scale of a variable give us the word heteroskedastic. A scale parameter is a standard 
deviation or a variance and the variable of interest here is the return from an asset. 
The variance of a return, conditional on the information in previous returns, is 
found to depend on this information. Engle (1982) defined a stochastic process 
whose variables have conditional mean zero and conditional variance given by 
a linear function of previous squared variables. The squared variables follow an 
autoregressive process in his pioneering and influential research, as we will see 
in Section 9.2. 

Subsequent research has provided many alternative functions that specify the 
conditional variance of a variable at time t as a function of information known at 
time f — 1. For any specification that also gives us the conditional density function 
at time ¢, we will call the stochastic process an ARCH model provided that the 
standardized residuals of the process are independent and identically distributed. 
In particular, we allow the conditional mean to vary though time, unlike some 
authors who prefer to restrict the acronym ARCH to processes whose conditional 
means are always zero. 

There is a multitude of ARCH specifications and many of them have their own 
acronyms, the best known being GARCH (generalized ARCH) from Bollerslev 
(1986) and EGARCH (exponential, generalized ARCH) from Nelson (1991). 
The popularity of the models can be explained by their compatibility with the 
major stylized facts for asset returns, by efficient methods for estimating model 
parameters and by the availability of useful volatility forecasts. The specification 
of conditional densities provides the likelihood function for a dataset, which can 
be maximized to give optimal parameter estimates. Several software packages 
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will maximize the likelihood function and thus estimation of an ARCH model is 
now a routine activity. Likelihood theory allows specifications to be compared 
and choices to be made from among the many functions that have been proposed 
for conditional variances. 

The literature on ARCH models is considerable. Bollerslev, Chou, and Kroner 
(1992) provide a review of theory and ten years of empirical evidence for financial 
markets, which covers applications to equities, foreign exchange, and interest 
rates. The authors describe an impressive number of interesting studies without 
requiring the reader to understand many equations. The subsequent survey by 
Bollerslev, Engle, and Nelson (1994) is suitable for those readers who wish to 
see more theory than is presented in this book. It also contains detailed examples 
of the specification of conditional densities for daily returns from US equity 
indices, going back as far as 1885. A neglected precursor to the ARCH literature 
is a working paper by Rosenberg (1972), while early likelihood estimates of an 
integrated ARCH specification can be found in Taylor and Kingsman (1979). 

Some basic ARCH models and the general ARCH framework are described in 
this chapter, followed by more complex models, likelihood theory, and modeling 
strategies in the next chapter. We primarily consider models for daily returns in 
these two chapters and also note some studies of weekly and monthly returns. 
Models for intraday returns require additional parameters to represent intraday 
volatility patterns, as will be explained in Chapter 12. There are many applications 
of ARCH methods in finance research, including investigations into asset pric- 
ing, hedging, and microstructure effects. Option pricing, volatility forecasting, 
and density estimation are also important application areas, that are described in 
Chapters 14, 15, and 16 respectively. 

This chapter continues with a brief account of the ARCH(1) model in Sec- 
tion 9.2. This is followed by a detailed description of the popular GARCH(I, 1) 
model in Section 9.3 and an example of its estimation and results for foreign 
exchange returns in Section 9.4. The general ARCH framework is explained in 
Section 9.5, with the assumption that the conditional distributions are normal. 
This assumption is relaxed in Section 9.6. GARCH models are much less sat- 
isfactory for equities than for foreign exchange, because the direction of price 
changes is relevant when modeling equity volatility. An asymmetric volatility 
model is described in Section 9.7 and estimated for a series of index returns in 
Section 9.8. 


9.2 ARCH(1) 


The simplest example of an ARCH process is the ARCH(1) specification presented 
by Engle (1982). The distribution of the return for period ft, conditional on all 
previous returns, is normal with constant mean u and time-varying conditional 
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variance h; defined by 
pe | Tes Feds v NU, hi) (9.1) 


and 
hy = oc o(ria HI, (9.2) 


The volatility parameters are o > 0 and o > 0. The volatility of the return in 
period ¢ then depends solely on the previous return. Either a large positive or a 
large negative return in period t — 1 implies higher than average volatility in the 
next period when o is positive; conversely, returns near the mean level jz imply 
lower than average future volatility. 
The residual at time t is 
€; — Yt— qu 
and the forecast error when predicting squared residuals is 
2 2 2 
v = er — Ele; | riz... ] = e; — hy. 


These forecast errors are uncorrelated. Replacing h; in (9.2) by e? — Ur gives 


e = o 4 oe, Tu 
and hence squared residuals follow an AR(1) process. This explains the AR part 
of the ARCH acronym. 

The ARCH(1) model is stationary when o « 1. It cannot describe the returns 
process successfully, because squared residuals have autocorrelations that cannot 
be approximated by the autocorrelation function p; = œ!"!; these autocorrelations 
are defined when e2 has finite variance, which requires 3o? — 1. Any satisfactory 
AR(p) process for squared residuals must have a high order p. A natural alternative 
is an ARMA process and this explains the interest in GARCH models. 
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The GARCH(1, 1) model with conditional normal distributions is the most pop- 
ular ARCH specification in empirical research, particularly when modeling daily 
returns. The letter “G” appears in the acronym of this model because it is gen- 
eralized from ARCH(1) by including a lagged variance term in the conditional 
variance equation. The popularity of GARCH(1, 1) may be explained by three 
observations. First, the model has only four parameters and these can be estimated 
easily. Second, it provides an explanation of the major stylized facts for daily 
returns. Third, it is often found that the volatility forecasts from this specification 
have similar accuracy to forecasts from more complicated specifications. Initially, 
we assume conditional normal distributions following Bollerslev (1986) and Tay- 
lor (1986), who independently defined and derived properties ofthe GARCH(I, 1) 
model. 
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9.3.1 Definitions 


The distribution of the return for period ft, conditional on all previous returns, is 
defined by 


re | rg sri 2; ...7 NU, hi) 
with 
hy = w (rii — uy + bhi. (9.3) 


There are four parameters, namely u, œ, 8, and c. The constraints w > 0,a > 0, 
and 6 > 0 are required to ensure that the conditional variance is never negative. 
The possibility o = 0 is of no interest and so we assume o is positive. The model 
is styled GARCH(1, 1) because one previous squared residual and one previous 
value of the conditional variance are used to define the conditional variance for 
period f. Calculations of conditional variances from the recursive definition (9.3) 
are straightforward, providing an initial value is available for the first time period. 
Numerical examples are provided in Section 9.4. 

The major properties of a GARCH(I, 1) stochastic process can be summarized 
in one sentence. The process is stationary if œ + B. < 1 and then 


* the unconditional variance is finite; 
* the unconditional kurtosis always exceeds three and can be infinite; 
e the correlation between the returns r; and r;4, is zero for all t > 0; and 


e the correlation between the squared residual s; = (r; — mu and s;47 is 
positive for allt > O and equals C (œ + £)" , with C positive and determined 
by both o and f. 


The process is now discussed in some detail, commencing with a second definition 
and then covering stationarity, an ARCH(oo) representation, selected moments of 
the process, its autocorrelations, an integrated specification, and finally prediction 
of future variances. 

The residuals of the process are 


€; — lt — Hu 
and the standardized residuals are defined to be 
noU et 
Jw 


The distribution of z; conditional on previous returns is then 


Zt (9.4) 


Zt | geli Ped cae ~ N CO, T) 


and thus it does not depend on the past history of returns. With minor additional 
assumptions, it follows that the z; are independent and identically distributed 
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(i.i.d.). The formal definition of GARCH(1, 1) with conditional normal distribu- 
tions used here is based on the i.i.d. assumption; thus, 


1/2 


r= eth, zt, (9.5) 
z ~ iid. N(O, 1), (9.6) 
and 
hy =@+a(r—1 — p)! + Bhi-1. (9.7) 
9.3.2 Stationarity 
From (9.5) and (9.7), 
hy = w + (az? ,-- B)hy-1, (9.8) 
and hence 
E[h;] = œw + (a+ B) E[h; 1] (9.9) 


whenever these expectations are finite, because z;_1 is independent of A;..;. From 
(9.9) it can be anticipated that the process is covariance stationary if and only if 
a+ B < 1. Bollerslev (1986) proves this result, assuming w is positive and that 
the process starts indefinitely far in the past. The process is also strictly stationary 
when o -- < 1 and the same assumptions apply (Nelson 19902). The constraints 
a+ < lando > 0 are now assumed until stated otherwise. 


9.3.3 Another Representation 


The conditional variance depends on all previous returns when is positive, so that 
the covariance stationary GARCH(1, 1) process has an ARCH(oo) representation. 
From (9.7) applied one time period earlier, biz œ + o(r; 5» — n)? + Bh, 5, 
and hence (9.7) can be rewritten as 


hy = o otn 1 — uy + Bo + alr — uy. + Bb Ai 
— o4 Bo a(rii— uy  oef(ria — uy + Bh. 


Repeated substitutions then show that 


hı = baa — TEE 
SS EE t-1 H QpAr-2 — H 


-FaB^(ri3 — Y! - af ria — uy, (9.10) 


where it is assumed that the process has an infinite past history. This equation 
shows that the conditional variance for period t is a linear function of the past 
squared residuals (rj. — A1. tT > 0, and that the weight given to past informa- 
tion diminishes as the lag t increases. The conditional variance is an increasing 
function of each squared residual so that volatility clusters will occur. A high 
average for recent squared returns will make the conditional variance high so that 
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a high squared return is more likely in the next period and vice versa for a low 
recent average. Equation (9.10) also shows that the conditional variances have a 
minimum level, that equals w/(1 — £). 


9.3.4 Moments 


Properties of the stochastic process {r;} are derived from the assumption that the 
standardized residuals (z;) are i.i.d. and also from the fact that h; is a function of 
the previous values (z;—1, Zt—2, Zr—3, ... ]. The unconditional mean return is 


E[r] = w+ EU" ]E[a] = u 
and the unconditional variance, denoted by o?, equals 


o? = var) = Els — n] = EIN = El] = —— —,. (94D 


B. 
from (9.9). The conditional variance equation, (9.7), can be rewritten using (9.11) 
as 
hy = (1 —« — f)o? c a(ri-1 — W? + Bhi. 
Thus h; is a weighted combination of the unconditional variance o?, the previous 
squared residual (r;..j — ui. and the previous conditional variance h;_,, with 
respective weights 1 — a — p, o, and f. 
The unconditional fourth moment is finite if and only if 


2o? + (a+ B <1 (9.12) 
and then 
Ets, — m] = EU])EU] = 3E[A7]. 

The unconditional expectation of h? can be derived by squaring both sides of 
equation (9.8), followed by taking expectations of both sides. This leads to the 
result that the kurtosis of returns is 

3E[h?] 1— (o 4 py , 

Ge > 

o^ 1 — 20? — (a + 8 


kurtosis(r;) — (9.13) 
when 2o? + (œ + p» « 1; otherwise, the kurtosis is infinite. Returns therefore 
have more kurtosis than the normal distribution, which is a consequence of the 
unconditional distribution being a mixture of normals. 


9.3.5 Autocorrelations 


The returns process is uncorrelated, because the conditional mean of the return 
at time ft is constant, whatever the returns before time t. Formally, for all positive 
lags T, 


1/2 1/2 
COv(T, repr) = cov(e;, Ctr) = Elh zi hi zer] 


= Elhy z hi JE le] = 0. (9.14) 
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The returns process is not, of course, a sequence of 1.i.d. variables because con- 
ditional variances are a function of previous returns. 
The autocorrelations of the squared residuals, 


st = (ri — ny = e = hiz? 
are only defined when the kurtosis of returns is finite, which we now assume. Let 
_ 42 = 2 
vr = ef — hy = hy (zz — 1) (9.15) 


be the forecast error when predicting squared residuals. Then the process {v;} is 
white noise, since its mean is zero, it is covariance stationary and its autocovari- 
ances are 


cov(vr, Gu) = Elvis] = Ele (22 — Dh GA, — 0] 


= Efh,(z? — Dh ]EI2,, — 1120 


for all positive lags. From (9.7) and (9.15) it follows that the squared residuals s; 
follow an ARMA(1, 1) process with innovations provided by the forecast errors 
vr; thus, 

Sst =w + (@+ B)si-1- vi — pvr. (9.16) 


The autocorrelations of an ARMA(1, 1) process are derived in Section 3.5. From 
equation (3.26) it follows that the autocorrelations of the process {s;} are 


cor(s;, Srel = C(a, pa +6), t>O, (9.17) 


with 
a(l — of — B?) 

(a + B)(1 — 2aB — B?) 
The term C (o, £) is positive whenever the kurtosis is finite. Hence the autocorrela- 
tions of the squared residuals are all positive and decline geometrically, whenever 
these autocorrelations are defined. For the typical values o = 0.06 and B = 0.92, 
C = 0.139 and the autocorrelations at lags 1, 10, 25, and 50 are respectively 
0.137, 0.114, 0.084, and 0.051. 

When the kurtosis of returns is infinite, the sample autocorrelations of a very 


C(a, B) = (9.18) 


long series of data {s;} are still given by the geometric decay formula, (9.17), but 
with C(a, B) = 5 (3a + B)/ (a + B) (see Ding and Granger 1996). 

The arguments used to show that the GARCH(1, 1) model can explain the 
major stylized facts for returns are similar to, yet different from, those given in 
Section 8.4. The first five assumptions in Section 8.4 are satisfied by a stationary 
GARCH(I, 1) model, with p? — h,, but the sixth assumption is not. Although 
the random variables ^; and z; are independent, the stochastic processes {h;} 


204 9. ARCH Models: Definitions and Examples 


and {z;} are not stochastically independent; in particular, ^; is a function of 
{Zr-1, Zr-2, --- }, stated as follows by Nelson (19902): 


oo k 
LEES (Iles. +) 
k=1 i=l 


9.3.6 Integrated Specification 


Empirical estimates of the sum of the parameters o and £ are often near one and 
sometimes the sum exceeds one if the parameters are not constrained. Conse- 
quently, Engle and Bollerslev (1986) consider the integrated specification when 
a+ B = 1, known as IGARCH(1, 1), with w only constrained to be nonnegative: 


hy — o 4 a(r — uy + — o). (9.19) 


Nelson (19902) discusses the mathematical properties of the integrated process. 
Although it is not covariance stationary, it is, however, strictly stationary when 
œ is positive and the process has an indefinite past; also, the unconditional vari- 
ance of returns is then infinite, but this does not prevent the calculation of condi- 
tional variances. Furthermore, strictly stationary models can also be defined when 
a+ B > 1 providing E[log(oz? + B)] <0. 


9.3.7 Forecasts 


The history of returns is irrelevant when forecasting returns, because the stan- 
dardized returns are i.i.d. The best forecast of ru, given a history of returns 
Ir, rt... }, is thus the constant mean yp for all positive forecast horizons n, 
when accuracy is measured by the expected squared forecast error. 

The history of returns can instead be used to calculate the conditional variances 
for all horizons. These are denoted by $;.,.,,, because they also equal the conditional 
expectation of the squared residual au = Uu — HI. At time f, the history 
provides 


$41 = Agi = Var (ripa | reo rss) = Else | Fr pi, LL 
Initially, suppose o + 8 « 1. Then we can obtain 
$52 = var(ri45 | ri rici...) = Elsa | ro rias.-.] 
2 
= Ela | ro f1,- ] = EI, — Fitje] 
2 
= E[w + (az; 4 B) drer-s...] 


= w + (a + Bibi = (010 — e — B)o? + (wt Bib 
— o? + (a + B)(hi41 — 0°) 
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with c? the unconditional variance of the process, given in (9.11). Likewise, for 
n2, 


Stn = VAr (Fien | ri pt, lz Elfiiun | re ridi. -] 
= œ + (a + B)E[Rh us | ro ri pss] = 0 (+ BS n-1 
= o? + (a + B) n-1 — 0°). 
Hence, for all n 2 1, 
var (rn | ri ri d.) — 02 + (a + BY" (hi4 — 02). (9.20) 


This result can also be derived from the ARMA representation given previously by 
equation (9.16). Expectations of future volatility, as measured by the conditional 
variance of returns, are seen to revert geometrically towards the unconditional 
variance as the forecast horizon increases. The sum o + B determines the rate of 
reversion and is often referred to as the persistence parameter of the process. 
Similar methods provide results for the integrated case when o 4- B — 1, these 
being 
var(ritn | re, Tg 1, ) = (n — (ot huj. (9.21) 


Conditional variances for the total return over n periods are simply sums of one- 
period conditional variances, thus 


var(riz] d i rna | re ri ps )— Sij 
j=l 
1 — (a + B)" 
=no* + we (hi41 — 0°) 
UE ER 
(9.22) 
for the stationary process and 
var(rigi d rien | ri rias) = Ant — Do + nha (9.23) 


for the integrated process. 


9.4 An Exchange Rate Example of the GARCH(1, 1) Model 


The GARCH(1, 1) model, with conditional normal distributions, is defined by 
equations (9.5)-(9.7). Prices p;, returns r;, conditional variances h;, and stan- 
dardized residuals z; are connected by the system of equations 
1/2 
r = log(pi/pi-i) = i h,” z (9.24) 
and 


hy = @ + a(ri-i — W? + Bhi- = 0 + (027 4 + B), (9.25) 
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here ignoring any dividends when defining the returns. Numerical values have to 
be assigned to u, a, B, and c. 

Calculation of the conditional variances and estimation of the model parameters 
is easy for this ARCH specification, and can be accomplished without difficulty 
using Excel software. An example is provided here for the daily DM/$ exchange 
rate from January 1991 to December 2000, defined and graphed in Section 2.2. 
Readers who are not interested in ARCH calculations should skip to the subsection 
headed *DM/$ results" on p. 209. 


9.4.1 Conditional Variances 


Exhibit 9.1 illustrates the calculation of conditional variances and shows the 
results for a few days. First, the time series of 2591 daily exchange rates is 
located in cells B17 to B2607. These rates are the Deutsche mark price of one dol- 
lar. Next, returns are calculated. The return for 3 January 1991, which is period 
t = 1, equals 0.000 536 and is obtained by inserting =LN(B18/B17) into cell 
C18. Returns for the subsequent periods are obtained by selecting and copying 
cell C18, followed by pasting it into cells C19 to C2607. Some summary statis- 
tics for the set of 2590 returns are given in cells B3 to B10, obtained from the 
functions AVERAGE, STDEV, SKEW, KURT, CORREL, MIN, and MAX. The 
returns range from —4.0% to 3.4%, have more kurtosis than a normal distribution, 
and a sample standard deviation of 0.67%. The correlation between consecutive 
returns is 0.018 and provides no evidence of significant correlation. 

The only technical issue in the calculations is the value of the conditional 
variance for the first period. The variance of the complete sample of returns is 
used in Exhibit 9.1, to give the contents of cell D18 as ^; = 4.501 x 1075. 
Alternative possibilities are either to set hı equal to the unconditional variance, 
o? = w/(1 — a — f), or to include hı in the set of parameters that are to be 
estimated. The formulae for selected cells, including D18, are given in Table 9.1. 

The calculations in Exhibit 9.1 are for the values u = 0.0001, a = 0.06, 
B = 0.92, and o? 24x 1075, so that w = 8 x 1077. The four parameter values 
of the model are in cells G3 to G6. The values of u and o? are based upon the 
summary statistics, while the values of œ and f are typical values in the literature. 
Returning to the calculations in row 18, the first standardized residual, z1, is 
given by the formula =(C18-$G$3)/SQRT(D18) in cell E18; here the “absolute 
reference" $G$3 is used to ensure that subsequent copying and pasting gives the 
correct results. The remaining cell in the row, G18, is explained later. 

The calculations for time period t = 2 commence with hz = w+ (az? + B)hy. 
This can be obtained by inserting into cell D19 the formula given in Table 9.1, 
where again absolute references are used to refer to the model’s parameters. Then 
z2 is given by copying and pasting cell E18 into cell E19. 


207 


9.4. An Exchange Rate Example of the GARCH(1, 1) Model 


8EEG'S 


GER 


90-3494989 


€62¢10°0- 


L6 uer 6 


8L16'€ 


ZAT!) 


G0-4299¢'9 


8011000 


L6 uer 8 


c16€'0- 


vc00'€ 


S0-30S£c v 


8£9610'0 


L6 uer 7 


1£88 € 


LO 


S0-dvocc v 


£18900'0 


16 uer v 


c£80'v 


6v90'0 


SO-dvLlOS'v 


9€S000'0 


L6 uer € 


L6 uer c 


QI 


DIS 


(Qu 


DI 


Ausuep 607 


[enpiseJ pezipyepuelis 


SOUEUEN 


uinjeH 


60'68E6 


7 Do 


000t 0 


00001 ,99uEUEA 


0086'0 


eouelsisJeg 


Ov£0'O 


0090'0 


eudiy 


OPO UO: 


000L0 


000 LD 


pezujeuigiedeu 


97100 


00260 


ejeg 


LLo°S 


0090'0 


eudiy 


Lü: 


40-3000'8 


eBeuo 


6079000 


nN 


o6z Ł000`0 


po 


DI SUIN}OY 


JojoueJeg 


Exhibit 9.1. An example of GARCH(I, 1) calculations for the DM/$ rate. 


Itis now possible to evaluate all the remaining conditional variances, for times 
t > 3; select cells D19 and E19, then copy and paste them into the rectangle 


and E2607. 


E20, D2607, 


H 


D20:E2607, whose corners are D20 
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Table 9.1. Formulae used in the GARCH(1, 1) spreadsheets. 


Cell Formula 


C18 =LN(B18/B17) 

D18  -B4*B4 

D19 -$G$4-(($G$5*E18*E18)-4$6$6)*D18 

E18  -(C18-$6G$3)/SQRT(D18) 

G3 =0.001*G8 

G4 ` =G11* (1-G10) /10000 

G5 =G9 

G6 =G10-G9 

G18 =-0.5*(LN(2*PIO )+LN(D18)+(E18*E18) ) 


9.4.0 Parameter Estimation 


The parameters u, a, B, and w can be estimated from the returns data by seeking 
the values giving the maximum likelihood of obtaining the observed data. The 
theory of maximum likelihood estimation for ARCH models is introduced in 
Section 9.5 and reviewed in some detail in Section 10.4. For the present time we 
simply note that appropriate estimates for n returns are given by maximizing the 
log-likelihood function, 


logL = Soh, (9.26) 
t=1 
with 
I, = —Aflog(27) + log(h;) + z2]. (9.27) 


Each term /; is a function of u, a, P, and c. 

Column G of Exhibit 9.1 includes the values of /;. The formula for l4, in cell 
G18, is given in Table 9.1. Copying and pasting provides G19, etc. The value of the 
log-likelihood is then SUM(G18:G2607) and is shown in cell G13 to be 9389.09. 
Higher values of the log-likelihood are obtained by maximizing the contents of 
cell G13 by changing the values in cells G3 to G6. 

Perhaps the easiest way to maximize the log-likelihood function is to reparamet- 
rize the function. Maximizing over u, a, D, and c is the same as maximizing over 
I^, a, the persistence o + B, and the unconditional variance oi =o /(1— a — p). 
Scaling some of these terms makes their magnitudes comparable, which can make 
it easier to perform the maximization. Results are given here when the maximiza- 
tion is over the “optimization parameters" defined as 103, æ, œ + B, and 10462. 
The original spreadsheet is modified slightly to find the parameter estimates. The 
values of the “optimization parameters" are placed in cells G8 to G11 and then 
L, a, P, and w are calculated in cells G3 to G6. 

Maximization of the log-likelihood function can be achieved by use of the Excel 
tool called Solver. The constraints œ > 0.0001, a+ B < 0.9999, and o? > U are 
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appropriate when estimating the stationary version of the GARCH(1, 1) model. 
Solver is used to maximize cell G13 with the constraints applied to cells G9, 
G10, and G11. Commencing the maximization at the parameter values given in 
Exhibit 9.1 produces the optimal values shown in Exhibit 9.2. A highly desirable 
property of any maximization algorithm is that the answer does not depend on the 
initial values. This is indeed the case for the combination of algorithm, model, 
and data discussed here. 


9.4.3 DM/$ results, 1991-2000 


The maximum likelihood estimates D andó? = /(1—à — B) of the unconditional 
mean and variance of the returns process are respectively close to the sample 
mean and variance. The parameter estimates shown in Exhibit 9.2 also include 
â = 0.0354, B = 0.9554, and the persistence estimate & + f = 0.9908. Robust 
standard errors for these three estimates, calculated using matrices given later in 
Section 10.4, are 0.0081, 0.0097, and 0.0048 respectively. The unit root hypothesis 
a+ B = lis rejected at the 5% level in Section 10.5. The theoretical kurtosis of 
returns is found to be finite for the parameter estimates, by checking the inequality 
in equation (9.12). 

It is notable that the persistence estimate is nearer to one than values reported 
for daily observations of the DM/$ rate in earlier periods. Taylor (19942) estimates 
a = 0.099, B = 0.871, and å +B = 0.970 for the period from December 1977 to 
November 1990, while Bollerslev et al. (1994) estimate @ = 0.068, p — 0.880, 
and à 4- B = 0.948 for the period from January 1981 to July 1992. Both studies 
reject a unit root at low significance levels. 

Exhibit 9.2 shows summary statistics for the standardized residuals in cells D3 
to D10, based upon the estimated parameters. Their sample mean and variance 
are very near to the theoretical values of 0 and | respectively. Their skewness and 
kurtosis, however, both show that these standardized returns are not a sample from 
a normal distribution. Instead, the kurtosis (4.61), the minimum (—4.99), and the 
maximum values (4.35) suggest a fat-tailed distribution is more appropriate. 

Figure 9.1 shows the ten-year time series of volatility estimates from 1991 to 
2000 given by the annualized conditional standard deviations, o; = 4/259h;; 
the scaling constant (259) is the average number of returns per annum in the 
time series. The volatility estimates are plotted as percentages and range from 
6.5% in July 1996 to 19.1% in August 1991. Half of the estimates are inside the 
interquartile range, from 8.8% to 11.7%. The median and mean values are 10.3% 
and 10.6% respectively. The early estimates depend on the value of hy. If A1 is 
treated as an additional parameter, then firstly o becomes 13.7%, rather than the 
10.8% shown in Figure 9.1, and secondly the estimates of the original parameters 
change by negligible amounts. 
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Exhibit 9.2. GARCH(1, 1) parameter estimates for the DM/$ rate, 1991—2000. 


Figures 9.2 and 9.3 show the annualized conditional standard deviations in more 
detail, respectively for one year of high volatility and for one year of low volatility. 
These figures also show the annualized percentage returns, clustered around the 
horizontal axis. The scales are the same for the two figures, to emphasize the 


differences between volatility during the two highlighted years. 
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Figure 9.2. DM/$ volatility and returns in 1992. 


Figure 9.2 shows high levels of Deutsche mark volatility throughout 1992, with 
a range from 9.6% to 18.7%. An extreme return of 3.4% on 9 January is respon- 
sible for the sudden increase of o; from 12.8% to 16.2%. After a week of very 
high volatility, the estimates generally decline and fall below the median level 
in May. Much more dispersion can be seen in the returns during September and 
October and this translates into higher levels of volatility. These months coin- 
cide with a crisis in the management of the European exchange rate mechanism, 
which included the withdrawal of the British pound from this mechanism on 16 
September. There were four notable returns on 11, 14, 16, and 17 September, all 
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Figure 9.3. DM/$ volatility and returns in 1996. 


beyond +2.3%, which raised the volatility estimates from 12.0% to 18.3%. The 
period above the very high level of 17% lasts for about six weeks and is followed 
by a steady decline in volatility, to a year-end value of 12.4%. 

Figure 9.3, in contrast, shows much lower levels of volatility in 1996. The 
lowest level of 6.5% is followed by 8.3% on the next day, responding to a return 
of —1.7% on 16 July. The highest volatility estimate is 9.4% and hence volatility 
throughout the year is below the ten-year median level. 

Volatility forecasts for the stationary GARCH(1, 1) model revert towards a 
constant as the forecast horizon increases. The conditional annualized standard 
deviation at time f, for time period t + n, is given by otn = /259$, n with $; n 
defined by equation (9.20). Figure 9.4 shows forecasts up to six months into 
the future, when o: is either 8% or 14%. These forecast functions converge to 
V 25962 = 10.996 asn — oo. Therate of convergence depends on the persistence 
parameter, o + 6. When n = H + 1, with H = log(0.5)/ log(o + B), variance 
forecasts are halfway between the first forecast and the eventual limit. The half-life 
H is estimated to be 75 trading periods, which equals 3.47 months. 


9.5 A General ARCH Framework 


There are very many specifications of ARCH models in finance literature. The next 
two sections summarize their common features and provide some examples. The 
essential ingredients of ARCH models are conditional density functions, which 
describe the density of the next return conditional on information that is known 
at the present time. These densities are often assumed to be normal, as they are 
in this section. More flexibility is provided by permitting nonnormal conditional 
densities, which are illustrated in the next section. 
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Figure 9.4. Volatility forecasts. 

The general set-up makes use of trading periods indexed by f, returns r;, rel- 
evant information /;_; known at time ft — 1, a vector of parameters denoted by 
0, conditional mean functions us, and conditional variance functions h;. The 
conditional functions for time ¢ are defined using the information known at 
time ¢ — 1 and the parameters 0. Naturally /;_; includes r;—; and all the prior 
information J;_2. For the GARCH(1, 1) example considered in Sections 9.3 and 
9.4, I;-1 is the history of returns {7;~1, r;-2,...), Hr is a constant value u, 
hy = o + a(ri i — GI + bhi, and6 = (u, w, æ, BY. 

In the general set-up, with conditional normality assumed, 


re | fia ~ N (ust, ht), (9.28) 
with both ur and h; functions of Lu and 0. We will often refer to the residual 
€; — r;— ht, (9.29) 
which has conditional distribution 
e; | l-1 ~ Nu, hy), 


and to the standardized residual 
Ft — Ut 


Vh, 


a= (9.30) 


which has conditional distribution 
Zt | hi^ N(0, 1). 


As I;_1 suffices to determine z;—; and all previous values of the z-process, it 
follows that the standardized residuals are independent and identically distributed 
(1.i.d.). 
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Equation (9.28) can be interpreted in two ways. First, it provides a way to use 
observed time series to calculate numbers Uu: and h; at time t — 1 and hence a 
specific conditional distribution for the random variable r;. The observations are 
then assumed to start at time t = 1 and some initial information Jọ is assumed to 
be available. Second, equation (9.28) can be viewed as summarizing a stochastic 
process, formally defined by 


r= ur hu, ze iid. NO, 1). (9.31) 


This process could start at time t = 1, when initial information is required, or it 
could be defined for all integer times (including negative t) whenever the process 
is stationary. The stochastic process perspective is used to define models, while 
the time series perspective is used to estimate and test parameters and to produce 
predictive density functions. The general set-up is very flexible. Some of the more 
important specifications that have been used are now mentioned, after stating the 
criterion used to estimate parameters. 


9.5.1 Estimation 


The complete specification of conditional densities by (9.28) explains why ARCH 
models are a convenient way to model volatility. The product of conditional 
densities f(r; | I+—1,0) can be maximized to provide an appropriate estimate 
of the parameters 0 from a set of n observed returns (r4, r2, .. . , rn}. The product, 
as a function of 0, is 


L(0) = f (ri | Jo, 0) f (r2 | 5,0): fire | In-1, 9) (9.32) 


and its logarithm equals 


log L(0) = 3 log f (ri | l-1, 0) 
t=1 
X =} log(2) - 5 log(h,(0)) — 


t=1 


(rr — ju (0))" 
2h;(0) 


- Ç log(27) +) log(h:(0)) + ei (9.33) 


fel 


because the conditional densities have normal distributions. Equation (9.32) is 
simply the multivariate density of the returns, f (r1, ro, ..., rj, | 19, 0), when the 
information sets are the histories of the observed returns and then L(0) is the 
likelihood function. We will also use this name for L(0) when more information 
is used to define the conditional densities. Maximization of (9.33) provides the 
maximum likelihood estimate 6, whose properties are described in Section 10.4. 
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9.5.2 Information 


The information set /, , is very often restricted to the history of returns 
[ri-i, ri-2; . . . }. Itis sometimes augmented by calendar information, so that cal- 
endar effects can be modeled. Additional market information known at time t — 1 
can be included. Interest rates can help to specify time-varying expected returns. 
Options information is a particularly interesting source of volatility information 
that is investigated in detail in Chapter 15. Trading volume information has also 
been considered but with less success. 


9.5.3 Conditional Means 


Relatively few specifications have been suggested for conditional means, com- 
pared with the variety investigated for conditional variances. This reflects the 
general lack of important correlation in returns. The simplest specification for ju; 
is a constant. Some people include day-of-the-week and other dummy variables 
to capture calendar anomalies. Many people use an MA(1) specification to model 
the very weak correlation in the returns process; then 


Mr = U+ Oer (9.34) 


and the two parameters jz and © are included in the vector 0. 

It is intuitive to suppose that expected returns increase as risk increases. The 
conditional variance is one measure of risk, so that a plausible specification of 
the conditional mean is 

DEEST (9.35) 
This is an example of the ARCH-in-mean model of Engle, Lilien, and Robins 
(1987). An asset that has zero conditional variance is risk-free so it is logical 
to identify E with the risk-free interest rate and to require this parameter to be 
positive. The price of risk parameter, A, should also be positive for many assets. 
Both & and A are included in the vector 0. The assumption of a constant risk-free 
rate can be removed when interest-rate data {i;} are available, by then specify- 
ing Ur = i; + ah} / e Any model that represents Uu: as a function of h; is called 
an ARCH-M model. These models have autocorrelation in the returns process, 
because conditional variances and hence conditional means are autocorrelated; 
details are given in Hong (1991). 


9.5.4 Conditional Variances 


The passage of time has seen increasingly sophisticated attempts to describe 
conditional variances. The GARCH(p, q) model of Bollerslev (1986) generalizes 
the original ARCH(p) model of Engle (1982). The conditional variance then 
depends on the p most recent squared residuals and the q most recent conditional 
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variances, thus 
Dp q 
hy 20 Y oie? +9 Bjhi-j. (9.36) 
i=l j=l 


In many empirical studies it is found that p = q = 1 is appropriate. The residuals 
process is covariance stationary when o +: - - cop +1 +: --+ßq4 < 1. The auto- 
correlations of squared residuals are described in Bollerslev (1988), conditions 
for nonnegative conditional variances are provided by Nelson and Cao (1992), 
aggregation results are given in Drost and Nijman (1993), and option pricing 
formulae are derived by Duan (1995). 

The signs of the residuals e; .; are irrelevant in the GARCH model. Nelson 
(1991) shows that the symmetric treatment of positive and negative residuals is 
not appropriate for US stock market returns. His exponential GARCH model is 
one of the first of many specifications that involve asymmetric functions of the 
residuals (Engle 1990; Glosten, Jagannathan, and Runkle 1993; Zakoian 1994; 
Sentana 1995). Some of these specifications are described in Sections 9.7 and 
10.2 and one of them is estimated in Section 9.8. In some cases the specifications 
can be considered to represent two regimes, corresponding to either a rising or a 
falling price. More complicated regime switching models are defined from either 
price information (Fornari and Mele 1996; Anderson, Nam, and Vahid 1999) or 
from unobservable states (Cai 1994; Hamilton and Susmel 1994). 

The autocorrelations of the squared residuals decay rapidly towards zero for 
GARCH and many other specifications. There is evidence, however, that the 
empirical autocorrelations decay slowly so that long memory specifications de- 
serve consideration. Long memory ARCH models are introduced in Baillie, 
Bollerslev, and Mikkelsen (1996) and in Bollerslev and Mikkelsen (1996) and 
they will be considered in Section 10.3. 

Linear functions of squared residuals appear in many specifications of the 
conditional variance, as in the GARCH model. An equally plausible starting point 
is a linear function of absolute residuals that defines the conditional standard 
deviation, for example, 


P q 
hj — o > aileril+ Y ihi. (9.37) 
i=l j=l 


with special cases in Taylor and Kingsman (1979), Taylor (1986), and Schwert 
(1989). Absolute values, rather than squares, are employed in the EGARCH model 
of Nelson (1991), described in Section 10.2, and in several other specifications. 
Finally, the impact of nontrading periods can be modeled by multiplicative 
effects. For daily data, let hž be the appropriate conditional variance for a return 
calculated over a 24-hour period. A higher level is often appropriate if trading 
period ¢ includes the weekend and/or holidays for which the market is closed; 
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the multiplier m; = h;/h? then exceeds 1. One adaptation of the GARCH(I, 1) 
model is 


1/2 


Ti = u +h; n, ze~ iid. NO, 1), 
hy = mih, (9.38) 
ht = w + (œ + Bz? he, (9.39) 
m =1+ Ww +V, (9.40) 


with w; and v; respectively counting the number of weekend days (Saturdays and 
Sundays) and the number of vacation days during trading period t. The parameter 
vector is then 0 = (u, œ, a, B, W, VY. Similar multipliers m; are defined and 
estimated in Nelson (1991), Bollerslev et al. (1994), and Taylor (19942). 

The ARCH family is well populated and its variety can be bewildering, particu- 
larly when considering the specification of the conditional variance. Bollerslev et 
al. (1994, p. 2971) aptly remark, “The richness of the family of parametric ARCH 
models is both a blessing and a curse. It certainly complicates the search for the 
‘true’ model, and leaves quite a bit of arbitrariness in the model selection stage. 
On the other hand, the flexibility of the ARCH class of models means that in 
the analysis of structural economic models with time varying volatility, there is a 
good chance that an appropriate parametric ARCH model can be formulated that 
will make the analysis tractable." Methods for guiding the selection of a specific 
model are presented later, particularly in Sections 10.5 and 10.6. 


9.6 Nonnormal Conditional Distributions 


Empirical evidence, commencing with Engle and Bollerslev (1986) and Bollerslev 
(1987), has often contradicted the assumption that returns have conditional normal 
distributions. Very often the distribution of the estimated standardized residuals, 
2, obtained from observed returns and a parameter estimate ô, has excess kurtosis. 
The assumption z; ^ NO, 1) is then untenable when seeking a satisfactory 
description of the returns process. This is important if a researcher requires a 
model that is compatible with empirical evidence, for example, when attempting 
to derive the density of future prices by Monte Carlo methods. 

A false assumption of normality is not serious, however, when estimating ARCH 
model parameters that are only used to calculate conditional variances or when 
testing parameter restrictions. Indeed, for econometric reasons, it can be beneficial 
to assume normal distributions when the assumption is known to be false, as 
discussed in Section 10.4. Consequently, for some purposes, modeling nonnormal 
conditional distributions is not necessary. 

The most general ARCH set-up that we now consider supposes that the stan- 
dardized residuals are i.i.d. with zero mean, unit variance, and a distribution that 
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depends on one or more parameters. This distribution is denoted by DO, 1) and 
now 
nog hu, zi~ iid. DO, 1). (9.41) 


The conditional distributions of returns are now represented by 

ri | Fa ~ Du, hi), (9.42) 
and, as before, us and h, are functions of information J;_; and the parameter 
vector 0. 
9.6.1 Examples 


The two most popular choices for the distribution DO. 1) are the standardized 
t-distribution (Bollerslev 1987) and the generalized error distribution (Nelson 
1989, 1991). Figures 4.3 and 4.4 display examples of their density functions. 

The density function of the standardized t-distribution is determined by one 
parameter, the degrees-of-freedom v; thus, 


z2 oon 
fI») = och + | (9.43) 
v—2 
with c(v) defined in terms of the gamma function, I (-), by 
F1 
c(v) = Gi » (9.44) 


Tüv4z(v-2) 
The parameter v must exceed 2. The condition for a finite moment of order 
nisn < v. In particular, the kurtosis is finite when v > 4 and then equals 
3(v — 2)/(v — 4). As v — oo, the density function converges to that of the 
standard normal distribution. The gamma function is defined by an integral as 


oo 
T (u) =f x'"e^ dx, u»0. 
0 


Some useful results are rG) = 4m, (1) 2 1,  (u+1) = uT (u), and I (n) = 
(n — 1)! for positive integers n. 

The density function of the generalized error distribution (GED) is also deter- 
mined by one parameter, the tail-thickness 7; thus, 


ll xe |" 
Fein = cone] - Sie | (9.45) 
with 
* Pay)" gprsg 5] 
— 9-(1/n) E EE St 
Ay) = 2-0/n | and Cn) = EH . (946) 


The parameter 7 is positive. The distribution is standard normal when n = 2 
and has thicker tails than the normal when n < 2; it is double exponential when 
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n = 1. All the moments of the distribution are finite and the kurtosis equals 
Dou ra WEG. 

Both the above densities can be represented as mixtures of normal distributions 
(Praetz 1972; Hsu 1982) and hence standardized returns can be factorized as 


z =m} u, with E[m,] = 1 and u; ~ N(0, 1). 


It is then possible to fit ARCH models into the information arrivals framework 
of Section 8.3, with m; measuring the news arrivals in period t divided by the 
number expected at time t — 1 (Taylor 19943). 

Further possibilities for the distribution D(0, 1) are the two-parameter gener- 
alized t-distribution of Bollerslev et al. (1994, p. 3018) and the nonparametric 
specification of Engle and González-Rivera (1991). 


9.6.2 Estimation 


Parameters such as the degrees-of-freedom v or the tail-thickness 7 are included 
in the parameter vector 0. Typically, A; and z; are functions of some subset 0* of 
0 and the density function of z; is determined by the remaining elements, 0*, of 
0. The conditional density of observation t is 


f (z, (0*) | 0*) 


1.1,0)— 9.47 
fi |i 0) UD (9.47) 
and the log-likelihood function is now 
log L(6) 3 "log f (ri | l-1, 8) 
t=1 
= Y^ —}log(h,(6*)) + log(f(z(6*) |"). (9.48) 


t=1 
Maximization of (9.48) provides the maximum likelihood estimate 6 of all the 
parameters. 


9.6.3 Nonnormal GARCH (1, 1) 


Many formulae for the stationary GARCH(1, 1) process do not depend on the 
distribution of the standardized residuals z;. These include the unconditional vari- 
ance, the autocorrelations of the squared residuals when the kurtosis of returns is 
finite, and the forecasts of future volatility, provided by equations (9.11), (9.17), 
(9.18), and (9.20). The conditions for returns to have finite kurtosis, however, are 
firstly that 

kurtosis(z;) = EI =k, 


is finite and secondly that 


(a + B? -- a? (k, — 1) « 1, (9.49) 
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proved by Nelson (1990a, equation 29). The kurtosis equals 
1 — (w+ py 
1 — (œ + 8)? ea (k; — 1) 


when it is finite (He and Teräsvirta 1999, equation 8). 


kurtosis(r;) = kz (9.50) 


9.7 Asymmetric Volatility Models 


The conditional variance of future asset prices is not a symmetric function of 
changes in prices at some markets, particularly at equity markets. An x% rise in 
the price today typically has a different impact on future volatility to an x% fall in 
the price today, whatever the value of x. Nelson (1991) shows that a fall in the US 
stock market has a much larger impact on the next day’s volatility than a rise of 
the same magnitude. In this situation, squared residuals, e. 1= Uri Ht 
do not provide all the relevant new information at time t — 1 about volatility at 
time f. Instead, there is some additional information in e;. 1. 

Much research has focused on the sign of the residual e,..;, which is identical 
to the sign of the standardized residual z;—1. The additional information is then 
summarized by the indicator variable 

etam 1 eis, (9.51) 

0 ife; 120. 

Sometimes e;— in (9.51) is replaced by the return 7;_; in empirical work, which 
makes little difference to results because conditional means Uu: are very near 
zero. Engle and Ng (1993) describe hypothesis tests that can be used to decide 
if volatility is an asymmetric function. Their sign bias tests involve regressing a 
on explanatory variables such as $,..; and 5, e; 1. A significant t-ratio from an 
ordinary least squares regression is then evidence of asymmetry. 

We now consider a popular asymmetric specification for the conditional vari- 
ance h;, followed by numerical examples in Section 9.8. Further asymmetric spec- 
ifications, a summary of the magnitude of the asymmetric effect, and a review of 
explanations for this effect follow later in Section 10.2. 


9.71 GJR-GARCH 


The GARCH(1, 1) model states that jz; is a constant and h; is a linear function 
of e? and hr, 1. Asymmetry can be introduced by weighting EP differently for 
negative and positive residuals; thus, 


hy = w+ ae?_, +07 Si ie2 + Bhi. (9.52) 


This is a straightforward way to model asymmetry, here called the GJR(1, 1) 
model following the work of Glosten, Jagannathan, and Runkle (1993). The 
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squared residual is multiplied by o Lo" when the return is below its condi- 
tional expectation (S,.., = 1) and by a when the return is above or equal to the 
expected value (S;_; = 0). The parameters are usually constrained by w > 0, 
a>0,a+a™ zU. and P > 0. The GJR(p, q) model is defined by adding p 
terms to the right side of equation (9.36), so that 


p d 
h, 204 oi +a; S;1)e7_; + Kë Bjhij. 
i=l j=l 
To obtain theoretical results we will assume that the standardized residuals have 
symmetric, continuous distributions. Then E[S;_;] = 1, and S,.., is independent 
of ee Some formulae for the GJR(1, 1) model can be obtained by replacing o 
in the corresponding GARCH(1, 1) formulae by the average weight on the term 
eas i.e. by o + la-. For example, the persistence parameter of the process can 


be deduced by writing 
hii = œ (0 o a Sz + B)h, (9.53) 
and then taking expectations at time ¢ — 1: 
Ehi | If] = o (ac Än" Kl? 


Therefore, the persistence parameter is $ = o + Zon" + B. Also, forecasts of 
volatility are provided by (9.20) with o replaced by o + la^. 

The process is both covariance stationary and strictly stationary when d < 1 
and then the unconditional variance is o? = w/(1 — 4). Replacing o in the 
GARCH(1, 1) equations by a + la- does not provide correct fourth moments. 
It can be shown that the kurtosis of returns for the GJR(1, 1) process is finite 
when k; = Ft is finite and 


y = (œ + 307 +BY + (@ + 5a Y; — D + d (oc Y» (9.54) 
is less than one. The kurtosis then equals 
1 — (a + loa- + BY 
ley 


and the autocorrelations of squared residuals, s; = (rT; — ju)”, decrease geometri- 
cally, at the persistence rate @ = a + lo- + £; thus, 


kurtosis(r;) = kz (9.55) 


Pr = eoim) 59 ips FSU, 
_ (6 — B) — Bo) + ôo (9.56) 
GEELEN 


with 
Lier 
ET ESTE 
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These results can be derived from formulae for a family of extended GARCH(1, 1) 
processes proved by He and Terásvirta (1999, equations 8 and 21) and by Ling 
and McAleer (2002). 


9.8 Equity Examples of Asymmetric Volatility Models 


One of the simplest asymmetric volatility models to estimate is the GJR(1, 1) 
model. This model can be estimated without difficulty using Excel software, 
by adapting the methods for the GARCH(1, 1) model presented in Section 9.4. 
Estimation and results are illustrated here for the daily observations of the S&P 
100-share index from January 1991 to December 2000, defined and graphed in 
Section 2.2. 

Researchers often specify the conditional mean to be a function of previous 
returns, so we include both MA(1) and ARCH-M terms in the model to illustrate 
how these features can be included in estimation software. The GJR(1, 1)-MA(1)- 
M model estimated here is defined by combining equations (9.34), (9.35), and 
(9.52). Prices p;, returns (ignoring dividends) r;, conditional means us, condi- 
tional variances h;, residuals e, and standardized residuals z; are connected by 
the system of equations 


re = log(pe/ P1) = a + €r = hr +h, ze, (9.57) 
ps = u + Ah, + e, 5, (9.58) 
1 ife, <0, 
Supr (9.59) 
O0 ife; 40, 
and 
h,— o4 ae? , + a Sie + Bh, 4. (9.60) 


Readers who are not interested in the calculations for this model should skip to 
the subsection headed “S&P 100 results" on p. 225. 


9.8.1 Calculations 


Less detail about the calculations is given here than for the GARCH(1, 1) example 
presented in Section 9.4. The focus here is on those aspects of the calculations 
that are not required for the simpler GARCH(1, 1) model. 

Initially, we suppose the conditional distribution is normal and then there are 
seven terms in the parameter vector, 


0 —(u,XA,o,0,o,a ,y. (9.61) 


The parameters are estimated by maximizing the log-likelihood, given by equation 
(9.33), which is a function of 0. This can be done directly or by maximizing over 
a reparametrized vector. The results here are obtained by maximizing over 


0* = (10 (u + A5), A, O, a, A, $, 10407) (9.62) 
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GJR(I1, 1)-MA(1)-M model with conditional normal distributions. 


224 


9. ARCH Models: Definitions and Examples 


Table 9.2. Formulae used in the GJR(1, 1)-MA(1)-M spreadsheets. 


Cell 


Formula 


Exhibits 9.3—9.6 


D26 


G9 
G26 
H26 


=B3 

=$G$3+($G$4*SQRT (E27) )+($G$5*F26) 

=B4*B4 

=$6$6+ (($G$7+$G$8*H26) *F26*F26)+($G$9*E26) 
=C26-D26 

=0.001*G12-(G13*B4) 


-6G13 

-G14 

=G18* (1-G17) /10000 
=G15 

=(G16-1)*G15 
=G17-G7-0.5*G8 
=F26/SQRT (E26) 


=IF(G26<0,1,0) 


Exhibits 9.3 and 9.4 


I26 
Exhibit 9.5 
B14 


Exhibit 9.6 
B14 
B15 
B16 
B17 
B18 
G10 
I26 


--0.5*LN(2*PIQ)-0.5*LN(E26) -0.5*G26*G26 


-G10 

=B14-2 

=(B14+1) /2 

=GAMMALN (B16) -GAMMALN(B14/2)-0.5*LNCPI() *B15) 
=IF(G19>0.00001,1/G19, infinity") 
-$B$17-0.5*LNCE26) -$B$16*LN (1+ (G26*G26/$B$15) ) 


-G10 

-GAMMALN C1/B14) 

=GAMMALN (3/B14) 

=0.5*(B15-B16) - (LN(2) /B14) 

=0.5* (B16-3*B15)+LN(B14/2) 

-6G19 

=$B$18-0.5*LN(E26)-0.5*IF(ABS(G26)>0, 
EXPC$B$14* CLN CABS (G26) )-$B$17)) ,0) 


with s the sample standard deviation of the observed returns, A = (a+ a7 )/a, 
g=at Zo" + B, and o? = w/(1 — $). The first component of 0* is approxi- 
mately a multiple of the conditional mean when the conditional variance is at its 


median level. 


Exhibit 9.3 shows some of the calculations for an initial guess at the optimal 
value of 0*. The initial guess is in cells G12 to G18 and the corresponding vector 
0 is in cells G3 to G9. The terms md are used to calculate the conditional means 
and variances, etc. At time t we know Uu: and h;, because they are calculated from 
information known at time 1 — 1. We then use the return r; to calculate e;, z;, S;, lr, 
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ht+1, and 144, in that order. The quantity l; is the contribution of observation f to 
the log-likelihood function and is defined by equation (9.27). The initial values for 
the mean and variance in cells D26 and E26 are given by the summary statistics 
for the whole sample. After formulae are entered into cells F26, G26, H26, 126, 
E27, and D27, all of the remaining cells can be filled by copying and pasting. 
Table 9.2 provides a list of the relevant formulae. 

The Excel tool Solver can find the maximum of the log-likelihood function. 
Suitable constraints on 0* when estimating the stationary GJR(1, 1) model are 
a > 0.0001, A > 0,¢ < 0.9999, and o? > 0. The maximum likelihood estimates 
are shown in Exhibit 9.4 and are discussed in Section 9.8.2. 

Next suppose we are also interested in estimating the model with nonnormal 
conditional distributions. We first assume the conditional distribution is the stan- 
dardized t-distribution with v degrees of freedom. There are then eight parameters 
to estimate and 6 = (u, A, œw, ©, o, 6, B, v)’. As v can be anywhere between 2 
and infinity it is easier to estimate its reciprocal, which is between 0 and l. Then 
v-l is the eighth term in 0* and the additional constraint 0.001 < v^! < 0.499 
is appropriate when estimating the parameters using Solver. The only change that 
needs to be made to the calculations concerns the terms /;. From equations (9.43) 
and (9.48), 


1 Dd a 
bp = 3 log(he) + log(c(v)) - —— log { 1+ 77 (9.63) 


v 
with c(v) defined by (9.44). Exhibit 9.5 shows the result of maximizing the log- 
likelihood function. Cells B14 to B17 are used to hold functions of v. In particular, 
log(c(v)) is in B17, with the values of log(1'(-)) provided by the function GAM- 
MALN. The spreadsheet formulae for l, and log(c(v)) are included in Table 9.2. 

Similar methods provide the results when the conditional distribution is the 
generalized error distribution with tail-thickness parameter 7. The eighth term in 
both 6 and 6* is 7, and the constraint n > 0.01 is appropriate. The expression for 
l, becomes 


I, = —Elog(hy) + log(C(m)) — bie, /A(n))” (9.64) 


with Co) and Am) defined by (9.46). Exhibit 9.6 shows the result of maximizing 
the log-likelihood function. Cells B14—B18 contain functions of 7. In particular, 
log(C()) and log(A(1)) are respectively in B18 and B17. Once more, relevant 
spreadsheet formulae are included in Table 9.2. 


9.8.2 S&P 100 Index Results, 1991-2000 


Table 9.3 lists the GJR(1, 1)-MA(1)-M parameter estimates for the three condi- 
tional distributions, normal, t, and GED. Standard errors are shown in brackets 
and are calculated using the methods described later in Section 10.4. 
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Table 9.3. Parameter estimates for a GJR model estimated from the S&P 100 index. (The 
GJR(, 1)-MA(1)-M model is defined by equations (9.57)-(9.60). The conditional distribu- 
tions are either normal or nonnormal as defined in Section 9.6. The data are the daily values 
of the S&P 100 index from January 1991 to December 2000. Standard errors are shown in 
brackets.) 


Distribution 
Parameter Normal t GED 
u x 104 —6.07 —2.56 —3.31 
(5.44) (4.45) (4.35) 
A 0.1377 0.1112 0.1092 
(0.0687) (0.0577) (0.0566) 
e 0.0110 —0.0111 — 0.0163 
(0.0203) (0.0207) (0.0199) 
w x 106 1.150 0.752 0.842 
(0.338) (0.214) (0.217) 
a 0.0108 0.0123 0.0119 
(0.0097) (0.0106) (0.0113) 
a” 0.0869 0.0784 0.0789 
(0.0219) (0.0153) (0.0151) 
B 0.9324 0.9398 0.9389 
(0.0114) (0.0088) (0.0088) 
v 6.616 
(0.771) 
n 1.362 
(0.020) 
A — (+a )/a 9.043 7.383 7.658 
g=a+05a +£ 0.9867 0.9913 0.9902 
(0.0054) (0.0041) (0.0042) 
o? x 104 0.8627 0.8654 0.8614 
log(L) 8430.94 8495.13 8487.82 


Most of the estimates are similar across the distributions and the numbers 
discussed in this paragraph are for the normal specification. Of primary interest 
are the estimates of o" in comparison with the estimates of o. With o" = 0.0869 
and @ = 0.0108, negative residuals have much more impact on the conditional 
variance than do positive residuals. The squares of negative and positive residuals 
are respectively multiplied by 0.0977 and 0.0108. The estimated asymmetry ratio 
is remarkably high, being A = 0.0977 /0.0108 = 9.0. The review of asymmetry 
ratios in Section 10.2 will show that the level of estimated asymmetry for 1991— 
2000 is higher than for previous time periods. Adding half of &^ to @ and to 
the estimate B — 0.9342 gives the persistence estimate H — 0.9867. Variance 
forecasts then have half-life H = log(0.5)/ log(Q) — 52 trading periods, which 


230 9. ARCH Models: Definitions and Examples 


5 4 3 2 1 0 1 2 3 4 5 


Standardized residual, z 


Figure 9.5. Density estimates for standardized residuals. 


is two and a half months. The estimates of the three parameters that determine the 
conditional mean are © = 0.0108, which is insignificantly different from zero, 
ju = —6.073 x 107^, and À = 0.1377. The positive sign for À is plausible, while 
the negative sign for fi is not; the null hypothesis à = 0 is tested in Section 10.5 
and just rejected at the 5% level. 

The levels of kurtosis calculated from the standardized residuals are 5.50, 5.57, 
and 5.55 for the three conditional distributions. The standard errors of these 
estimates are 0.10 when the conditional distribution is normal and hence that 
hypothesis is unsustainable. The estimated shape parameters are 0 = 6.62 for the 
t-distribution and 7 = 1.362 for the GED, implying kurtosis levels of 5.29 and 
4.13 for the standardized residuals. The t-distribution matches the sample kur- 
tosis better and it also has a higher maximum value for the log-likelihood than 
the GED. Figure 9.5 shows a kernel estimate of the density of the standardized 
residuals from the normal specification (solid curve), for comparison with the 
standard normal density, the estimated ¢ density and the estimated GED (three 
dotted curves); the kernel estimate is defined by equation (4.6) with bandwidth 
equal to 0.2. The nonnormal densities provide a much better fit to the kernel den- 
sity than the normal density. The maxima of the densities are 0.40 (normal), 0.46 
(t), 0.50 (kernel), and 0.51 (GED). 

The condition for the models to have finite kurtosis for the returns is y < 1, 
with y defined by (9.54). The estimates of y are 1.002 for the t-distribution and 
0.995 for the GED, indicating that any appropriate model may have an infinite 
fourth moment for returns. 

Figure 9.6 shows the ten-year time series of volatility estimates from 1991 
to 2000 given by the annualized conditional standard deviations, o; = ./253h;; 
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Figure 9.6. S&P 100 index volatility, 1991—2000. 


the scaling constant (253) is the average number of returns per annum in the 
time series. These estimates of o; are for the conditional normal specification 
and differ only slightly from the estimates for the ¢-specification. The volatility 
estimates are plotted as percentages and range from 7.3% in May 1995 to 47.1% 
in September 1998. Half of the estimates are inside the interquartile range, from 
10.5% to 17.6%. The median and mean values are 13.0% and 14.6% respectively. 
Figure 9.6 shows that the second half of the returns series was more volatile than 
the first half, with average levels of 11.3% from 1991 to 1995 and 17.9% from 
1996 to 2000. This might be evidence either that the index returns process was not 
stationary throughout the decade or that the volatility process has a long memory. 
Ignoring the MA(1) term, the conditional mean is determined by the conditional 
variance as ut = u + ah} / The annualized values, 2534, vary considerably, 
with minimum 0.7% and interquartile range from 7.7% to 23.2%. 

Figures 9.7 and 9.8 show the annualized conditional standard deviations and 
the annualized percentage returns for one year of high volatility and for one year 
of low volatility. The scales are the same for the two figures. 

Figure 9.7 shows high levels of index volatility throughout nearly all of the year 
2000. The new millennium commences with volatility at 11% and then shoots up 
to 22% following a fall in the index of 3.8% on 4 January. Thereafter, volatility 
remains above the median level for the decade. It is, of course, the large market 
falls that increase the volatility of this series dramatically. The largest fall during 
the year was 6.0% on 14 April, which moves the volatility estimate from 25% 
to 39%. The fall was more than reversed after the weekend by rises of 4.1% and 
3.0% on 17 and 18 April, despite which volatility fell slightly because little weight 
is given by the estimated model to the squares of positive residuals. Five months 


232 9. ARCH Models: Definitions and Examples 


N w D 
Oo © Oo 


= 
© 


and percentage returns 


Annualized percentage conditional volatility 


JF M AM J J A S O N D 


Figure 9.7. S&P 100 index volatility and returns in 2000. 
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Figure 9.8. S&P 100 index volatility and returns in 1995. 


later volatility had declined to 13%, to be followed by a general increase in the 
final quarter of the year. Falls of 3.096 and 3.896 on 15 and 20 December move 
the volatility back up to 3396. 

Figure 9.8 shows much lower levels of volatility in 1995. The lowest level 
of 7.396 is near the estimated lower bound for the model, which is equal to 
2530/(1 — p = 6.1%. It is followed by 10.9% on the next day, responding 
to the lowest return during the year, equal to —1.6% on 18 May. The highest 
volatility estimate is 12.9% on 21 December and hence volatility throughout the 
year is below the ten-year median level. 
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9.9 Summary 


ARCH models define conditional distributions for returns that are characterized by 
time-varying conditional variances. The parameters of these models can be esti- 
mated by maximizing the likelihood of observed returns and hence the volatility 
of returns can be calculated. Many choices can be made in selecting a model, so 
that an accurate description of the process generating observed returns becomes a 
realistic aspiration. Additional accuracy may require models that are more com- 
plex than those presented so far. Further models, likelihood theory, and methods 
for selecting a model are the major topics in the next chapter. 
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ARCH Models: Selection and 
Likelihood Methods 


Several additional ARCH models are described in this chapter. Methods for select- 
ing a model from the many possibilities are given, including hypothesis tests that 
use maximum likelihood estimates and their standard errors. 


10.1 Introduction 


There seem to be few limits to the complexity that can be built into an ARCH 
model. The simple examples described in Chapter 9 suffice for many purposes. 
There are, however, applications that require more complicated structures, as does 
the search for more accurate descriptions of observed returns. The exponential 
GARCH model of Nelson (1991) is another asymmetric volatility model. It is 
described in Section 10.2, where we consider asymmetric specifications and evi- 
dence in some detail. The long memory extension of EGARCH investigated by 
Bollerslev and Mikkelsen (1996) is presented in Section 10.3. It is a significant 
advance in the search for a better model. 

Deciding if one model is better than another often requires hypothesis tests that 
involve parameter estimates and their standard errors. The appropriate likelihood 
theory for estimates, standard errors, and tests is documented in Section 10.4, 
with examples from the research literature discussed in Section 10.5. The final 
selection of an ARCH model may either reflect pragmatic concerns or a belief that 
no further useful progress can be made. Section 10.6 provides details of diagnostic 
checks that are often used to assess the adequacy of a model. 

Section 10.7 mentions additional interesting specifications that are beyond the 
scope of this chapter, including multivariate models. Finally, Section 10.8 con- 
cludes the two ARCH chapters. 


10.2 Asymmetric Volatility: Further Specifications and Evidence 


We recall some notation: returns r; have conditional means and variances respec- 
tively denoted by ur and h;, the residuals are e; = r; — ur, and the standardized 
residuals are z; = e;/A/h,. 
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10.2.1 EGARCH 


Another popular way to introduce asymmetry into conditional variances was 
developed by Nelson (1991). He proposed ARMA models for the logarithm of 
h. To appreciate why these might be plausible, consider the special case of the 
integrated GARCH(1, 1) model (9.19) when œ = 0; then 


hy = ae?» + (1 — oh, 


which implies 
(hy — hi) hia = ea D. 


When o is small, this process is similar to a random walk with i.i.d. steps for 
log(A;) = (hy — h; 1)/ h; 1. Thus a family of processes in some ways similar to 
GARCH(1, 1) can be defined by supposing that log(/1;) is an AR(1) process with 
residuals that are an appropriate symmetric function of the standardized residuals 
zi-1. Nelson (1991) goes beyond this symmetric framework by employing a 
particular asymmetric function g(z;—1) for the volatility residuals. 

Nelson's simplest stationary EGARCH model assumes returns have conditional 
distributions that are normal with constant mean and with variances given by 


log(A;) = Mog(ny + A^(og(t1) — Miog(ny) + gGi-1) (10.1) 
and 
8(Zr-1) = 9zii + y((z-1il — y 2/7). (10.2) 


The four variance parameters are the mean Hogh) of the process log(h,), the 
autoregressive parameter A, and the two parameters, d and y, that appear in the 
function g. The terms g(z;..) have zero mean, because 4/2/7t is the expectation 
of |z; 1| when z;_; has a normal distribution. The terms g(z;_1) are also i.i.d., 
since the variables z;_ ; have the same properties. Thus equations (10.1) and (10.2) 
define an AR(1) process for log(h;) that is stationary when —1 < A < 1. As h; is 
the exponential of an AR(1) process, the model for returns is called EGARCH(1). 

The volatility residual function g(z) is defined by two straight lines that join 
at z = 0. The function has slope 9 — y when z is negative (the market falls) and 
slope ? + y when z is positive (the market rises). Figure 10.1 shows g(z) as dotted 
lines when 9 = —0.11 and y = 0.22, as estimated by Nelson (1989) for an index 
of 90 US stocks from 1928 to 1956. The estimated function for a later period 
is also shown, by solid lines, using 7 = —0.12 and y = 0.16, from estimates 
in Nelson (1991) for the CRSP value-weighted US market index from 1962 to 
1987. It can be seen that a large standardized residual will increase conditional 
volatility, but by much more when the market moves down. 

EGARCH models with nonnormal conditional distributions are defined by a 
small change to equation (10.2). The constant ./2/z is replaced by the expectation 
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1.2 
gz) 


0.8 


Figure 10.1. Examples of the function g (z). 
of |z; 1| to ensure that the expectation of g(z;_1) is zero; thus, 


g(z—1) = 9z— + y (zl — Ellze—il])- (10.3) 


The required expectation for the standardized t-distribution with v degrees of 

freedom is 

2/v — 20 [Qv + 0/2] 
A71 (v — Dr[v/2] 


while for the GED distribution, with thickness parameter 7, it equals 
Eliz Cu APO) ren ^. 


The general EGARCH(p, q) model of Nelson (1991) represents log(/1;) as an 
ARMA(p, q) process, with residuals defined by (10.3). He proves many results. 
In particular, the returns process is strictly stationary if and only if the ARMA 
process is strictly stationary. Returns are covariance stationary when they are 
both strictly stationary and have finite variance. Strictly stationary processes with 
conditional normal distributions or conditional GED distributions, with 7 > 1, 
have all moments finite. However, conditional t-distributions typically have no 
finite unconditional moments for returns. The autocorrelations of squared residu- 
als (when they exist) are given by very complicated formulae. A geometric decay 
formula can be obtained for the EGARCH(1) model, for the autocorrelations of 
the logarithms of squared residuals (when they exist), because 


EI, all = 


i (10.4) 


log((r; — ui) = log(hr) + log(z7) 
is the sum of AR(1) and white noise processes (Taylor 19942). 


10.2.2 Further Examples 


There are many more asymmetric volatility models in the literature (e.g. in 
Hentschel 1995), three of which are mentioned here. Franses and van Dijk (2000) 
discuss these and other specifications in some detail. The conditional variance in 
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the GJR(1, 1) model includes the squared residual multiplied by a function of the 
residual; thus, 


hy = © w(ei-)e2 44 Bb (10.5) 
with 
TORE KR EE (10.6) 
if e;1,1 > 0. 
Changing the weighting function to 
ute up (10.7) 


€r—] 
defines the quadratic GARCH(1, 1) model of Sentana (1995), that has 


2 2 
h,—o0-c e +e? ; + Bhi =o- Y ado ip us + Bhy-1. 
4a 2a 


This specification is more flexible in one respect than some others because it does 
not assume that the next period’s variance is minimized when the latest residual 
is zero; rather, h; is symmetric around e 1 = —y/(2a). 

Function (10.6) changes abruptly at e 1 = 0, while function (10.7) has iden- 
tical limits as e;;.4 — +00. Both drawbacks are avoided by a smooth transition 
specification (Hagerud 1997; González-Rivera 1998), such as 

ô 
1 + exp(£e;-1)' 
The weight then changes monotonically as e;.., increases, from o + ô (as ej. > 
—00) to a (as ej, — oo) and it equals the mid value a + 56 when e;.., = 0. The 
additional parameter ¢ controls the rate at which the weights change. As ¢ — oo, 
the GJR(1, 1) specification is obtained with a~ = ô. 

Weighted combinations of absolute residuals can also be used to define condi- 
tional variances. The threshold GARCH(1, 1) model of Zakoian (1994) defines 
conditional standard deviations hy ie by 


w(er—-1) =at+ & 0. (10.8) 


Di = w kale Al a7 Siler] + Bay (10.9) 


t-1? 
with A, 1 equal to 1 when e;_; is negative and equal to 0 otherwise. This is 
similar to the GJR(1, 1) model, but with / and el in (9.52) respectively replaced 
by v/h and |e|. The symmetric special case, when o" = 0, is the absolute value 
GARCH(I, 1) model of Taylor (1986). 


10.2.3 News Impact Curves 


The various asymmetric volatility specifications provide different functional rela- 
tionships between the next period's conditional variance h; and the information 
known at time t — 1. The new information at time t — 1 is summarized by the 
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Figure 10.2. News impact curve, GJR model, S&P 100 index. 


residual e: 1. while old information is summarized by h;_; and perhaps other 
variables known at time t — 2. The news impact curve of Engle and Ng (1993) is 
defined by considering how h; varies with e;_;, when all variables known at time 
t — 2 are replaced by their unconditional values. The notation N (e;—1) is used to 
represent the curve here. 

For variations of GARCH(I, 1) it is sufficient to replace bh, by the uncondi- 
tional variance o?. Consider, for example, the GJR(1, 1) model defined by (9.52). 
Then 


N (e11) = hrle | hi = 0?) = w+ Bo? + (ac 7 Se? ,. (10.10) 


This curve can be written more compactly as two quadratic functions, joined to 
each other at the vertical axis; thus, 


N(0) + (a --a7)x? when x <0, 


10.11 
N(O) + ax? when x > 0, ) ) 


N(x) = | 
with N (0) = w+ Bo?. Figure 10.2 is an example of the curve when the parameters 
are the values given in Exhibit 9.4 for the S&P 100 index. For the EGARCH(1) 
model, defined by (10.1) and (10.2), the corresponding curve is 

N(0 D h 0, 
NO)- (0) exp((9 + y)x/o) whenx < (10.12) 
N(O) exp((0 — y)x/o) when x > 0, 
with N(0) = 024 exp[(1 — A)tlog(ny — Y 2/7 ] when the conditional distribu- 
tion is normal. Engle and Ng (1993) provide further examples and also document 
a nonparametric estimate of the news impact curve. Franses and van Dijk (2000) 
illustrate the curves for many specifications. 


10.2.4 How Much Asymmetry? 


The magnitude of the asymmetric volatility effect can be measured by the ratio of 
the weights given to positive and negative residuals that have the same absolute 
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value. For the GJR(1, 1) model, the squares of negative residuals are multiplied by 
a+a™ while the squares of positive residuals are multiplied by œ. The asymmetry 
ratio is then defined as 
oca 

a 
Likewise, for the EGARCH(1) model, standardized absolute residuals are multi- 
plied by either y — 2 or y + 9 so that the asymmetry ratio is 

y—v 


án ae: (10.14) 
y 


A 


(10.13) 


These ratios can be related to the news impact curve N (x) and its first derivative 
N' (x). For GJR(1, 1), 


|. N(-x) -NO -N'(-x) 
— NOG)—-N(O) ` N'(x) 


for all positive x, while the same relation is a good approximation for EGARCH(1) 
when x is near zero. For other specifications, the function 


-N'(-x) 
Alte ——— 

N'(x) 
can be used to attempt to summarize the asymmetric effect. 

Estimates of A can be inferred from the results in many papers, although 
the models estimated are often extensions of those mentioned above. Estimates 
greater than one are frequently found for equity markets. There is no evidence, 
however, for asymmetric volatility at foreign exchange markets (Taylor 19942). 
As DM/$ and $/DM volatility are identical, symmetric specifications are then 
plausible. 

Estimates for US stock indices are more than one for samples of daily returns 
throughout the twentieth century. The estimates of A vary considerably, perhaps 
reflecting the difficulty of estimating A precisely. The pioneering work of Nelson 
(1989, 1991) provides A — 3.2 for the Standard 90-share index from 1928 to 1956 
and A = 7.2 for the CRSP value-weighted market index from 1962 to 1987. Brock 
et al. (1992) estimate A = 2.3 for a ninety-year Dow Jones Industrial Average 
series from 1896 to 1986. Bollerslev et al. (1994) present complicated news impact 
curves that include more parameters than the examples discussed above. Their 
results indicate that A(x) is very approximately 2 in each of four periods, 1885- 
1914, 1914-1928, 1928-1952, and 1953-1990. Bollerslev and Mikkelsen (1996, 
1999) estimate long memory EGARCH models for the S&P 500 index and have 
A = 3.1 from 1953 to 1990 and A = 6.0 from 1961 to 1991. Evidence for 
asymmetric effects is also found in monthly US returns from 1815 to 1925, by 
Goetzmann, Ibbotson, and Peng (2001). 
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Next, considering more recent years, Blair et al. (2002) show that one extreme 
return can have a substantial impact on the estimate of A. They find A — 4.1 
for the S&P 100-share index from 1983 to 1992 and the lower estimate A — 2.1 
when a dummy variable is used to remove the crash return on 19 October 1987. 
For the same S&P index, Blair et al. (20012) report the high value A = 8.5 for 
the GJR model from 1993 to 1998 while Taylor (2002) finds A — 5 for a long 
memory EGARCH model estimated from 1989 to 1998. 

Asymmetric volatility effects have been found for individual US firms by 
Christie (1982), Cheung and Ng (1992), Duffee (1995), and Jubinski and Toml- 
janovich (2003). Blair et al. (2002) compare volatility results from GJR(1, 1) 
models for the S&P 100 index with results for all firms that were included in the 
index at some time during the same decade, from 1983 to 1992. The median esti- 
mate of A is 2.3, compared with 4.1 for the index, with the estimated asymmetry 
for the index exceeding that for 83% of the firms. The estimates of A exceed 1 
(i.e. œ > 0) for 9596 of the firms, although only 14% of the estimates reject the 
hypothesis of symmetry (i.e. A = 1, a~ = 0) at the 5% level. 

There is also convincing evidence for asymmetric equity volatility in Japan. 
Engle and Ng (1993) present many results for the Japanese TOPIX index from 
1980 to 1988, including A — 2.6 for the GJR model and A — 1.8 for the 
EGARCH model. Their evidence for asymmetry remains significant when the 
series of returns is truncated to exclude the 1987 crash. Bekaert and Wu (2000), 
for the overlapping period from 1985 to 1994, find A — 2.8 as part of a multi- 
variate model for the Nikkei 225 index and three portfolios. 

The evidence is less consistent for the UK. Poon and Taylor (1992) report 
A — 1.3 for the UK FT All-Share index from 1969 to 1989. Taylor (2000) gives 
estimates for indices and twelve UK firms from 1972 to 1991. The estimates for 
the firms range from 1 to 1.6 and the majority do not reject the hypothesis A — 1 
at the 5% level. In contrast, substantial asymmetry (A = 5.4) is estimated for the 
FTSE 100 index in Section 16.2, for the later period from 1992 to 2002. 


10.2.5 Explanations 


Any economic explanation of asymmetric volatility effects cannot rely on features 
of modern trading habits, such as the demand for portfolio insurance, because 
effects are found throughout the last century. The asymmetric effect has often 
been referred to as a “leverage effect," following Black (1976b), who noted that 
volatility rises when the market falls and debt/equity ratios increase. However, 
the asymmetric effect is large while daily changes in leverage are small. Duffee 
(1995) uses simple methods applied to all 2494 US firms included in a set of 
CRSP tapes to show that the degree of asymmetry is related to neither debt/equity 
ratios nor firm size. Bekaert and Wu (2000) show asymmetry is not related to 
changes in leverage for Japanese portfolios. Their detailed study provides the 
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explanation of covariance asymmetry for the asymmetry in firm volatility: neg- 
ative market shocks increase conditional covariances between market and stock 
returns substantially, unlike positive shocks. 

Another explanation has been called the *volatility feedback effect" and in- 
volves a contemporaneous negative relationship between returns and volatility. 
Assuming volatility risk is priced, an increase in volatility will raise the required 
equity return and cause an immediate price decline. Campbell and Hentschel 
(1992) develop a price model that displays volatility feedback, with the dividend 
shock being their only state variable. Wu (2001) extends the modeling frame- 
work to two state variables, dividend growth and dividend volatility, and provides 
empirical evidence for weekly and monthly market returns. Both dividend news 
and volatility feedback are found to be important components of the process that 
generates returns. 

A trading explanation for asymmetric effects in daily volatility at the level of the 
firm is investigated by Avramov, Chordia, and Goyal (2004). They use transaction 
databases to identify buyer and seller initiated trades in their study of 2232 NYSE 
firms. They then show that selling activity governs the asymmetric effect. Their 
results support their argument that “herding” or uninformed traders sell when 
prices fall, leading to an increase in volatility, while “contrarian” or informed 
traders sell after prices rise leading to a reduction in volatility. 


10.3 Long Memory ARCH Models 


The specific examples of ARCH models presented so far can all explain the 
stylized fact that the autocorrelations of both absolute returns and squared returns 
are positive. However, the rate at which such autocorrelations decay towards zero 
may be incompatible with GARCH and similar models. Stationary GARCH and 
EGARCH models have a property known as short memory and, in particular, the 
theoretical autocorrelations pr of conditional variances and squared returns are 
then geometrically bounded, i.e. Joel < C$* for some C > 0and 1 > ọ > 
0. Empirical autocorrelations for absolute returns and squared returns provide 
evidence that their theoretical counterparts decay more slowly so that they are 
not geometrically bounded (Dacorogna, Müller, Nagler, Olsen, and Pictet 1993; 
Ding et al. 1993; Bollerslev and Mikkelsen 1996). A long memory model is 
then appropriate. Further evidence for long memory effects comes from high- 
frequency data, as we will see later in Chapter 12, where more detail is provided 
about the mathematics of long memory processes. 

Short memory ARCH models are typically special cases of long memory ARCH 
models. The special cases correspond to setting a long memory parameter d to 
Zero. It 1s rather surprising that there are relatively few studies of long memory 
specifications, because estimates of d usually reject the null hypothesis d = 0 


10.3. Long Memory ARCH Models 243 


comprehensively. This is relevant when pricing options because prices are shown 
to be sensitive to the parameter d in Section 14.7. 

As previously discussed in Section 3.8, long memory models are usually defined 
by applying the filter (1 — L)? to a process followed by assuming the filtered 
process is a stationary ARMA (p, q) process. The lag operator L shifts any pro- 
cess (y;) backwards by one time period, so Ly; = y;-.1, while the differencing 
parameter d is between zero and one for volatility applications. The filter then 
represents fractional differencing and it is defined by the binomial expansion 


dd-),, dd-D4-2,5, 


d. 
Q0 - Lf -1-dL -> 3 


(10.15) 


The infinite series cannot be simplified when 0 < d < 1, which may explain why 
long memory ARCH studies are sparse. 

Baillie (1996) and Bollerslev and Mikkelsen (1996) both show how to use the 
filter (1 — Lid to define a long memory process for the conditional variance h+, by 
making either the GARCH or EGARCH model more general. The GARCH gen- 
eralization cannot be recommended because the returns process then has infinite 
variance for all positive values of d, which is incompatible with the stylized facts 
for asset returns. The EGARCH generalization may not have this drawback, as 
then log(h;) is covariance stationary ford < 1 and it may be conjectured that the 
returns process has finite variance for some specifications and parameter values. 


10.3.1 FIEGARCH(1, d, 1) 


From Section 10.2, the EGARCH(1, 1) model defines the conditional variance 
by 


log;) = Hie + (Y — ALTA + V L)g(zi-i) 


and 


g(zi-1) = B21 + y(zi-il — Ellzi-ilp. (10.16) 


Inserting the additional filter (1 — L)~@ into the function of L that multiplies past 
values of the volatility shocks g(z;—-1) gives a model investigated by Bollerslev 
and Mikkelsen (1996, 1999): 


log(hy) = eet + (1L — AL) ! (- L) "(0 + VL)g(u-). ` (10.17) 


Fractional differencing of log(h;) — Mogh) then gives an ARMA(1, 1) process, 
hence log(/1;) can be defined by fractional integration (FI) of the ARMA process, 
i.e. itis an ARFIMA(1, d, 1) process. The acronym adopted for the returns process 
is then FIEGARCH(I, d, 1). 
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10.3.2 Calculations 


Fractional differencing creates computational issues because the expansion in 
(10.15) must be truncated at some point. The coefficients of the powers L/ are 
provided by 
S ji-d-l1 
d- -1-Y'ajb, a=d, aj- -— j 22, (10.18) 
j=l 
and they decay slowly, being asymptotically proportional to j~"+”. This sug- 
gests truncation after a large number of terms, although the truncation limit N 
must not be too large compared with the number of available observations. The 
limit N = 1000 has often been selected. 
Given a choice for N, the conditional variances for the FIEGARCH(1, d, 1) 
model can be computed from 


N 
log(hy) = Mog(ny + Kä bj[log(ti— j) — Moginy] + gGi-1) + V g(z;-2) (10.19) 
j=l 
with the coefficients b; defined by 


oo 
Q0—AL)u-L)z-1-M bjL, b =d+A, bjcaj-Aaja. j22. 

j-l 

i (10.20) 
For data indexed by times t > 1, some of the times t — j in (10.19) will precede 
time 1. The terms log(/1;—j) can be replaced by uos; whenever t — j < 0. To 
commence calculations, set log(^1) = Hogh) and g(zo) = 0. All the conditional 
variances are influenced by these substitutions. When data are plentiful it may 
then be advisable to estimate parameters by maximizing the likelihood function 
for a subperiod that excludes the first 1000 or so observations. 


10.3.3 Examples 


FIEGARCH(I, d, 1) and more complicated structures are evaluated in Bollerslev 
and Mikkelsen (1996), using 9559 daily returns of the S&P 500 index from 1953 
to 1990. The maximum likelihood estimate of d is 0.633, with a standard error of 
0.063, when the conditional distributions are assumed to be normal for the purpose 
of estimating the parameters. As the estimate is ten standard errors above zero, the 
null hypothesis d = 0 is rejected and hence the ARCH model provides evidence 
against short memory specifications. Likewise, the null hypothesis d — 1 is also 
rejected, which indicates that an integrated process is not appropriate. Similar 
results are obtained by Bollerslev and Mikkelsen (1999). For the later period 
from 1989 to 1998 and the same specification of the variance process, Taylor 
(2002) estimates d to be 0.57 for daily returns from the S&P 100 index. 
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All three studies find d > 0.5 when the likelihood of the data is maximized for 
the FIEGARCH(I, d, 1) model. Thus the ARFIMA process for log(/1;) appears 
to have an infinite variance. However, we will see evidence in Chapter 12 that 
high-frequency returns support the claim that d is less than 0.5 for the volatility 
process of both exchange rates and equity indices. 


10.4 Likelihood Methods 


In addition to estimating the parameters of a model by maximizing the likeli- 
hood function, it is also of interest to test hypotheses about the parameters and 
to estimate the standard errors of the parameter estimates. Appropriate likelihood 
theory is presented in Bollerslev and Wooldridge (1992) and in Bollerslev et al. 
(1994), building on results provided by Engle (1982) and Weiss (1984, 1986). 
Likelihood theorems are proved for the GARCH(1, 1) model, including the inte- 
grated specification IGARCH(1, 1), by Lee and Hansen (1994) and Lumsdaine 
(1996). Theorems for general ARCH(p) and GARCH(p, q) models are respec- 
tively proved by Kristensen and Rahbek (2004) and Ling and McAleer (2003). 

The methods and results depend on the specification of the distribution of the 
standardized residuals. First we consider the maximum likelihood estimate (MLE) 
when it is necessary to choose a distribution. Then we consider the quasi-MLE 
(OMLE) approach, which obtains results by maximizing the likelihood when the 
distribution is assumed normal but this assumption is not made when estimating 
the standard errors. Finally, some results for the models estimated in Chapter 9 
are discussed. 

In this section, 0 denotes the p parameters ofthe model, sothat0 = (04, ..., 05) 
is a p x 1 vector, and On denotes the true value of 6. The model states that the 
conditional mean and variance, 4;(0) and h;(0), of the return r; are known at 
time t — 1 from information /;..,. The standardized residuals z; = (r; — Heil vhr 
are i.i.d. observations from a distribution whose density function is f(z | 0). 
The mean and variance functions are assumed to be differentiable as often as 
necessary. 

It is not necessary to assume that the process {r;} is strictly stationary. Many 
results simplify when it is strictly stationary, however, to classical results for 1.1.d. 
observations that are stated and derived in standard texts (e.g. Greene 2000). 


10.4.1 MLE 


The likelihood function L (0) is the product of conditional densities. Its logarithm 
is the sum of the logarithms of the conditional densities, which are denoted by 
l1; (0); thus, 


log L(6) = X` 14,0) 
"ESI 
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with 
1,(0) = log f (ri | 1,0) = —4 log h; (8) + log f (z;(0)). (10.21) 
The partial derivatives of l, (0) define the p x 1 score vector, s; (0), by 
ol, (0 
51(6;) = a bodie (10.22) 
i 


The MLE, denoted by d ML. maximizes log(L(@)), which requires solving the p 


equations 
n 


NU s@)=0, 1<i< p. 
t=1 
The MLE is consistent and its asymptotic distribution is normal when On is not on 
the boundary of its parameter space, the conditional density is correctly specified 
and regularity conditions apply. 
Define the p x p information matrix Ao by 


L 8?l, 
Aj cdm sy E| —- |, 1<ij<p, 10.23 
(Ao)i, = lim 25 Ew Lj SP (10.23) 


evaluated at the true parameters 69, which simplifies for a strictly stationary pro- 


cess to 2 
32l, 
Ao)j.j = —E ; 
(Ao)i,j Ew 


Then the distribution of y/n (8, ML — 00) converges to a p-variate normal distri- 
bution as n — oco, written here as 


Vn (B, mL — 60) > N(0, Ag b. (10.24) 


Consistent estimates of the terms in the matrix An are calculated from n observa- 
tions by 


^ Je 2I 1/3?logL 
f ( PE ) (10.25) 


id n — 0696; n X 30:30; 


with the derivatives evaluated at the MLE. 
The matrix Ao equals the expected value of the outer product of the scores 
when the conditional density is correctly specified. Then Ao = Bo with 


1 n 
(Bo); j = lim — 9 ^ Els 6)5,5)1. (10.26) 
t=1 


evaluated at 09, which again simplifies to the standard result for i.i.d. observa- 
tions when the ARCH process is strictly stationary. The terms in this matrix are 
consistently estimated by 


n 


` 1 
Bij = — 915 0050). (10.27) 


t=1 
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evaluated at d ML. Only first derivatives are required to compute Ê, which can be 
useful when it is difficult to calculate satisfactory second derivatives and hence 
Á. 

The BHHH algorithm (Berndt, Hall, Hall, and Hausman 1974) maximizes the 
log-likelihood by an iterative method that calculates B at each iteration. Thus 
this popular algorithm only requires first derivatives to find the MLE and to 
compute its covariance matrix. Numerical derivatives are routinely calculated 
by software. However, analytic derivatives are generally preferable and can be 
calculated easily for many models (Fiorentini, Calzolari, and Panatoni 1996); 
examples are presented later and in the appendix to this chapter. 

Hypothesis tests about one of the p parameters can be performed by construct- 
ing the usual ratio statistic. For example, consider testing the null hypothesis that 
the first parameter is a particular value, say Ho : 6, = 0f. First find the MLE and 
an estimate C of its covariance matrix (either n-lÁ-lorn^! B- Then the first 
element of the MLE and its estimated variance, say 6 and C 1,1, define the ratio 


$0 _ 9 


"m 


whose asymptotic distribution is the standard normal distribution, N (0, 1), assum- 
ing 0f is inside the set of feasible values for 01. General Wald tests, of a parameter 
constraint Ho : c(09) = 0, for a function c, can be performed using a formula 
given by Bollerslev et al. (1994, p. 2982). 

Likelihood-ratio (LR) tests can also be used to test a constraint on the true 
parameter vector 09. For example, consider the following null hypothesis that 
specifies k < p of the parameter values: 


t= (10.28) 


Hy :6 =0", 1<i<k. (10.29) 


The MLE under the null hypothesis is provided by maximizing the log-likelihood 
function over the p — k unconstrained parameters to give a maximum value Lo. 
Denote the maximum of the unconstrained log-likelihood by L;. Then the usual 
likelihood-ratio result can be used to decide the hypothesis test, namely that 


2(Li — Lo) ^ x? (10.30) 


with ~ indicating that this is an asymptotic result. A large value of the test statis- 
tic indicates that the alternative hypothesis is much more likely than the null 
hypothesis to describe the data and then the null is rejected. 

The LR and Wald tests are asymptotically equivalent, but they can of course 
provide different results when evaluated for a finite sample of observations. Note 
thatthe asymptotic test theory does requires modification when the null hypothesis 
restricts 0 to be on the boundary of its parameter space. As defined here, the LR and 
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Wald tests are then conservative and reject a true null hypothesis less often than 
the significance level of the test. For example, if the null hypothesis is a normal 
distribution and the alternative is a t-distribution with v degrees of freedom, then 
the nullis Ho : v-! = 0, for which 69 is on the boundary. Bollerslev (1987) notes 
that the LR test is then conservative. 


10.42 QMLE 


The distribution of the standardized residuals is generally unknown. If we specify 
a particular density function and we are wrong, then there is no guarantee that the 
MLE and its standard errors are consistent. Newey and Steigerwald (1997) show 
the MLE is consistent if the true and assumed densities are both symmetric. They 
also show the MLE is inconsistent if the true density is asymmetric, the assumed 
density is symmetric, and the conditional mean is not always zero. 

The QMLE methodology guarantees the consistency of estimates and standard 
errors by assuming normal distributions. There are then no distribution parameters 
in 0. The logarithms of the conditional densities are now 


1,0) = —z[log(2xr) + log(; (0)) + z? (6)] (10.31) 


and maximizing the log-likelihood function provides the QMLE, denoted by 
Ó, Qu 

The QMLE is consistent whatever the true distribution of the standardized 
residuals, assuming that the conditional mean and variance functions have been 
correctly specified. However, the covariance matrix of nôn, QML is not consis- 
tently estimated in general by the sample counterpart of either Ag l or Bo ! The 


asymptotic result is now 
Vn (Ôn QML — 6) ~ N(0, Ag! BoAg D. (10.32) 


The matrix Ag ! BoAg l is in general different to both Ag ! and Bg ! when the 
conditional distributions that define the observations are not normal. 

The QMLE is less accurate than the MLE when the conditional distributions are 
known and are not normal. Engle and González-Rivera (1991) quantify the reduc- 
tion in efficiency for the GARCH(1, 1) model with conditional t-distributions. 
The ratios var(&wry.)/ var (ĉQmL) and var(ÊmL) / var (BoML) are both asymptoti- 
cally equal to 0.41 when the degrees of freedom parameter is 5; the ratios increase 
to 0.82 when the degrees of freedom parameter is 8. 

The matrices Ao and Bo can be estimated, as before, by (10.25) and (10.27). The 
second derivatives in (10.25) can be avoided by using the following alternative 
consistent estimate of Ag, given by Bollerslev and Wooldridge (1992), 


o I Zi os 1 (dh,\ (dh; 
Ápw)i; = 10.33 
E EE SE 
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The standard errors provided by estimating Au : BoAg L using (10.27) and either 
(10.25) or (10.33) are called robust standard errors, to emphasize that they are 
applicable when the assumption of conditional normal distributions is known to 
be wrong. Robust Wald tests are performed in the obvious way using the QMLE 
and the robust standard errors. 

Differentiation of (10.31) and some algebra shows that the score vector is 


z (Out SE 
DG + ; 10.34 
n Am) 2h, X86; E 
To compare the magnitudes of robust and nonrobust standard errors, suppose the 
conditional mean is a constant. Then d;/00; = 0, 


x Ix 1 f/8h V (ah; 
Ánw)i; = 10.35 
Lisi, 23 6905 ( ) 


f=1 


" 1 «& (2 — 1? f 8h; Y äh, 
Bi; = 10.36 
S 225 4h? X86; )\ 06; SE 


t=1 


and 


Then the limit of the ratio Bi. i/ (Ánw)i, j» aS n increases, is 
k = (Bo)i,j/(Ao)i,j = ÀE[G2 — 1?] = LEI — 1] foralli and j. (10.37) 


(Pagan 1996). Since observed z; are leptokurtic, « > 1. The covariance matrices, 
in descending order of magnitude, are then approximately n! Ag ! for the robust 
method, n-lAg ! for the information matrix, and n ue E ! for the outer prod- 
uct of the scores. These results show that the robust standard errors of parameters 
in the variance equation will exceed the values provided by the alternative stan- 
dard methods, when normal distributions are assumed but the z, are leptokurtic. 
Standard errors that are not robust will often be unreliable, because the excess 
kurtosis is often substantial. 

All of the above theory is asymptotic. There are few Monte Carlo investigations 
into the accuracy of the theory for samples of hundreds or thousands of obser- 
vations. Bollerslev and Wooldridge (1992), Lumsdaine (1995), and Fiorentini et 
al. (1996) provide some results for relatively short time series, while Bollerslev 
and Mikkelsen (1996) give results for long series from GARCH models and long 
memory extensions. 


10.4.3 GARCH(, 1) Example 


The GARCH(1, 1) model has four parameters, with 6 = (u, œ, a, B)’. Assuming 
symmetric distributions, the matrices Ao and Bo are block diagonal, with the mean 
parameter u in one block and the three variance parameters in the other block 
(Engle 1982). Thus we set all terms outside the diagonal blocks of À and B tozero. 
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The robust covariance matrix for (Ô, 0. By can be obtained after evaluating the 
3 x 3 matrices in equations (10.35) and (10.36). This can be done using analytic 
derivatives of h;, by evaluating 


Oh; 0h; | 

we ris 

Jw TA 9o ' 

db, 2 0h; | 
rua (ri-1 — Hl uo 
Oh; db, 

EEN oe , 

35 DE Te 36 


at the QMLE. The initial partial derivatives can be set either to their unconditional 
expectations, 


9h] 1 din ðh ` w 
ðw 1- əx | 8B  (1—8)01—oa—fp)» 
or to zero (when h; is not a function of the parameters). 
The QMLE for the DM/$ from 1991 to 2000 was obtained in Section 9.4 using 
Excel. Further Excel calculations that utilize Visual Basic code provide the terms 
in the matrices and the following robust standard errors for the estimates: 


Parameter Estimate Robust standard error 
u 1.38 x 1074 1.23 x 1074 
w 4.24 x 1077 1.90 x 1077 
a 0.0354 0.00806 
B 0.9554 0.00966 


The robust estimate of the correlation between @ and B is —0.867 and the robust 
standard error of the persistence estimate d =at B = 0.9908 is 0.00482. 

The :-ratio for the null hypothesis u = O equals 1.12, so this hypothesis is 
accepted at conventional significance levels. The interesting null hypothesis of a 
unit root in the variance process, i.e. 6 = 1, produces a t-ratio equal to (0.9908 — 
1)/0.00482 = —1.91, which is discussed further in Section 10.5. 

The robust standard errors of the variance parameters exceed those from the 
information matrix and the products of the scores by amounts that are similar to 
those predicted by (10.37). The standardized residuals have kurtosis equal to 4.61, 
so that x is estimated to be 1.80. The values of B;,;/(Apw)i.j are near € and range 
from 1.77 to 2.03. Dividing the standard errors from the products of the scores by 
the robust standard errors gives 1.80, 1.82, and 1.64, again approximately equal 
tok. 


10.4.4 GJR(1, 1)-MA(1)-M Example 


The GJR(1, 1)-MA(1)-M model defined in Section 9.8 has three mean parameters 
and four variance parameters. Parameter estimates for the S&P 100 index series 
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are given in Table 9.3, with normal, t-, and GED distributions. The nonnormal 
specifications add an eighth parameter to 0. The standard errors of the parameter 
estimates are also provided in Table 9.3. They are all calculated using the outer 
product matrix B of equation (10.27), and for the normal specification they are 
robust and also use the matrix Âgw from (10.33). These matrices contain first 
derivatives that are evaluated analytically, using the formulae in the appendix to 
this chapter. The analytic calculations are not difficult if Visual Basic code is 
used. Numerical first derivatives may give less accurate results, while satisfactory 
numerical second derivatives could not be obtained when an attempt was made 
to evaluate the matrix Â in (10.25). 

Tests about the values of A and the persistence ¢ are deferred to the next section. 
The negative estimates of u are counterintuitive. However, their standard errors 
are seen to be relatively large and 95% confidence intervals for u include sensible 
positive values. The MA(1) parameter estimates are insignificant at conventional 
significance levels. The t-ratios for the null hypothesis © = 0 are 0.54, —0.54, 
and —0.82 for the three distributions. 

The robust correlations between the parameter estimates for the normal speci- 
fication include cor(fi, À) — —0.95. The only other correlation outside the range 
+0.6 is cor(@, B) = —0.62. Also, cor(&, &-) = —0.47, cor(&, B) = —0.36, and 
cor(&-, B) = —0.54. 


10.5 Results from Hypothesis Tests 


The selection of an ARCH model can be guided by hypothesis tests based upon 
the likelihood theory summarized in the previous section. We review methods and 
results for some important tests, about the shape of the conditional distribution, 
about the dependence of the conditional mean on the conditional variance, and 
about the persistence of the conditional variance. 


10.5.1 Tests for Conditional Normality 


The null hypothesis that the conditional distribution of daily returns is normal 
is usually rejected comprehensively, commencing with the likelihood-ratio (LR) 
test results of Taylor and Kingsman (1979) for two commodity series and of 
Bollerslev (1987) for two exchange rate series. The LR test is decided by the 
value of 2(L; — Lo) with Lo the maximum log-likelihood for conditional normal 
distributions and with Lı the maximum value for a larger class of conditional 
distributions, which includes the normal as a special case. The two most popular 
standardized alternatives are the standardized t-distribution and the generalized 
error distribution (GED) (see Section 9.6), which each have a single shape parame- 
ter, respectively the degrees of freedom v and the tail-thickness y. When 1/v — 0 
or n = 2 the distribution is normal. As these alternative distributions have one 
extra parameter, the test statistic is then compared with a critical value from the 
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x? distribution; for example, the null hypothesis is rejected at the 5% level if the 
statistic exceeds 3.84. This test gives conservative results for the t-distribution 
alternative, because zero is a boundary value for 1/v (Bollerslev 1987). 

Table 9.3 lists the maximum log-likelihoods for the S&P 100 dataset and the 
GJR(1, 1)-MA(1)-M model estimated in Section 9.8. The LR test values are 
respectively 128.4 and 113.8 for the t- and GED alternatives. These values fall 
in the far right tail of the x? distribution and thus reject the conditional normal 
hypothesis decisively. This is a standard result in the literature and I am unaware 
of any study of daily data that accepts the conditional normal hypothesis. Taylor 
(1994a) is one of many studies that reject the normal hypothesis for foreign 
exchange, with all LR test values exceeding 120 for the DM/$ rate from 1977 to 
1990. 

A second test procedure requires the standard errors (s.e.) of the shape param- 
eters. For the GED alternative, t = (7) — 2)/s.e.(7) is compared with a standard 
normal distribution which gives —31.2 for the S&P 100 dataset. With ¢ = 
vl, t= d. s.e.(£) can be evaluated and interpreted with caution; this statistic 
is 0.1512/0.0186 — 8.13 for the S&P 100 dataset. 


10.5.2 Do Expected Returns Depend on Volatility? 


A positive relationship between the conditional mean ju; and the conditional vari- 
ance h; might appear plausible, particularly when the asset approximates the 
market portfolio. The evidence for such a relationship can be assessed by esti- 
mating an ARCH-M model that includes h; in the equation that defines jz;. For 
our S&P 100 example, 


We = w+ ARP + Oe, (10.38) 


and the relevant null hypothesis is A = 0. The alternative of interest, A > 0, 
is essentially an econometric hypothesis, because asset pricing theory neither 
prescribes the functional relationship between us and h; nor requires it to be 
constant through time (see Glosten et al. 1993). A popular alternative relationship, 
used in much of the literature cited below, replaces h : P in (10.38) by h;. Then A 
can be interpreted as a measure of relative risk aversion. 

Several methods are available for testing a hypothesis such as à = 0 in (10.38). 
Two of these are the LR test and the usual Wald test, which requires specification 
of the shape of the conditional distribution; a third is the robust Wald test that 
assumes normal distributions and then utilizes robust standard errors. For our 
S&P 100 example, first consider results for conditional t-distributions when the 
value of L; is given in Table 9.3 as 8495.13. When the constraint A = 0 is applied 
we obtain Lo = 8493.91. Hence the LR test value is 2 x 1.82 = 3.64 and we 
accept the null hypothesis at the 5% level, because 3.64 is less than the 95th 
percentile of x5 which is 3.84. Again from Table 9.3, the MLE of à is 0.1112, 
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with standard error 0.0577, giving a Wald test value of 0.1112/0.0577 — 1.93. 
Once more we narrowly accept A = 0 at the 5% level, since 1.93 is within +1.96. 
These conclusions might be considered suspect if we thought the true conditional 
density was far from the assumption of a t-density. To avoid such doubts, we can 
use the QMLE of A and its robust standard error, 0.1377 and 0.0687 respectively, 
to obtain the robust Wald test value of 2.00. On this occasion the robust test just 
rejects à = 0 at the 5% level, in contrast to the other tests. 

The literature about the sign and the significance of A does not provide simple 
conclusions for the US equity market. Bollerslev et al. (1992, p. 25) state that 
almost all of the early studies find À is positive and significantly different from zero 
at the 5% level, examples including French, Schwert, and Stambaugh (1987) and 
Chou (1988). These studies predate the use of asymmetric volatility specifications. 
As incorrect specification of the conditional variance equation generally leads to 
inconsistent estimates of A (noted, for example, by Pagan and Ullah 1988), the 
early evidence may be unreliable. 

The EGARCH and GJR-GARCH studies of Nelson (1991) and Glosten et al. 
(1993) report negative estimates of A for excess returns from the CRSP value- 
weighted market portfolio; these are returns in excess of the risk-free rate. Nelson 
gives t = —3.36/2.03 = —1.66 using daily returns from 1962 to 1987 and the 
GED density, while Glosten et al. give the robust test values £ = —2.83 and 
t = —2.43 for their models 5 and 5-L, using monthly returns from 1951 to 1989. 

The intensive study of US index returns by Bollerslev et al. (1994) uses a variant 
of the EGARCH model and finds that the estimates of A (their parameter u3) are 
positive for daily returns in all four periods, 1885-1914, 1914—1928, 1928-1952, 
and 1953-1990. The respective values of t are 0.03, 0.71, 2.79, and 0.23 so that 
the null hypothesis à = 0 is rejected at the 5% level only for the period from 1928 
to 1952. We may also note that Blair et al. (2002) estimate that A is positive for 
daily returns from the S&P 100 index and 70% of its constituent firms, from 1983 
to 1992, with the null hypothesis A = 0 accepted at the 10% level for the index 
and 9046 of the firms. Their index result contrasts with the significant estimates 
of à (at the 10% level) that are documented in Table 9.3 for the same index in the 
later period from 1991 to 2000. 

In conclusion, if there is a relationship between the conditional daily mean us 
and the conditional daily variance h; for the US market, then it appears to vary 
through time, with either positive or negative dependence being possible. 

A different conclusion is obtained by Ghysels, Santa-Clara, and Valkanov 
(2004a) for the conditional monthly mean return of the CRSP value-weighted 
index from 1928 to 2000. They document significant evidence for a positive risk- 
return trade-off when the conditional monthly variance is estimated by using a 
function of squared daily returns. 
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Studies of UK index volatility have not produced significant values of A. AII 
estimates of à are positive but insignificant at the 5% level for the FT All-Share 
index from 1965 to 1989 (Poon and Taylor 1992) and all estimates are insignificant 
for the FTSE 100 index from 1985 to 1994 (Taylor 2000). 


10.5.3 Tests for a Unit Root in Volatility 


We next consider tests of the null hypothesis that the volatility process has a unit 
root. First suppose the alternative hypothesis is a stationary, short memory process 
for volatility. The null hypothesis is often stated as dh = 1, with $ the persistence 
parameter that equals o + £ for the GARCH(1, 1) model, a + la- + £ for the 
GJR(1, 1) model, and A for the EGARCH(1) model. The alternative hypothesis 
is ġ < 1. For models with more autoregressive parameters, d is defined as the 
largest root of a polynomial equation. For example, in an EGARCH(2, q) model 
the filter (1 — AyL — Asti multiplies log(h;) and typically it can be factored 
as (1 — $L)(1 — $1 L) with $ and d real numbers and |ġ| > |ġ1], as in Nelson 
(1991). 

Estimates of $ very often exceed 0.97 for series of daily returns, but since 
estimated standard errors of $ are often less than 0.02 it is necessary to perform 
hypothesis tests. 

The restriction a = 1 defines models that are not covariance stationary. The 
GARCH(I, 1) model is, however, strictly stationary when $ = 1. The results of 
Lee and Hansen (1994) and Lumsdaine (1995, 1996) show that the robust Wald 
test of the unit root hypothesis does not need to be modified in the GARCH(I, 1) 
context, but Lumsdaine shows the likelihood ratio and Lagrange multiplier tests 
are unreliable. Generally, the Wald test is performed without worrying about its 
properties in a unit root context, which may be dubious for the EGARCH model 
in particular. 

The evidence against a unit root in foreign exchange volatility is fairly strong 
and quite different to the evidence for equity volatility. The robust t-ratio for 
the daily DM/$ series from 1991 to 2000, modeled by GARCH(1, 1), equals 
(0.9908 — 1)/0.00482 = — 1.91. A one-tailed test is reasonable and then the null 
hypothesis is rejected at the 596 level. The unit root hypothesis is rejected for the 
same exchange rate in earlier years. Taylor (19942) gives results from 1977 to 
1990. His robust t-ratios are (0.9702 — 1)/0.0111 = —2.68 for GARCH(I, 1) 
and (0.9607 — 1)/0.0117 — —3.36 for the symmetric EGARCH(1) model. When 
nonnormal specifications are estimated the t-ratios vary from —2.28 to —3.93, 
while subperiod robust t-ratios are —2.96 (1977—1983) and —2.34 (1984—1990). 
All these test values reject the unit root hypothesis at low significance levels. 
Bollerslev et al. (1994) produce a more significant result for the similar period 
from 1981 to 1992, namely (0.948 — 1)/0.0138 — —3.78. There is less evidence 
against a unit root in the initial years of floating exchange rates. Engle and Boller- 
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slev (1986) estimate $ is 0.996 for weekly observations of the Swiss franc rate 
against the dollar from 1973 to 1985, while Engle and González-Rivera (1991) 
give estimates of 0.998 and 1.050 for daily observations of the £/$ rate from 1974 
to 1983, respectively for normal and t-distributions. 

The volatility of US equity indices often appears to contain a unit root. Boller- 
slev et al. (1992) state that the unit root null hypothesis is accepted by French 
et al. (1987), Chou (1988), Pagan and Schwert (1990), and Schwert and Seguin 
(1990). The EGARCH estimate of Nelson (1991) is ĝ = 0.99962, with standard 
error 0.00086 and t = —0.00038/0.00086 = —0.45, for a CRSP value-weighted 
index from 1962 to 1987. Bollerslev et al. (1994) have three estimates of @ out 
of four very near to 1 in their study of indices from 1885 to 1990. The esti- 
mates of their parameter A, are 0.9942, 0.9093, 0.9994, and 0.9979, respectively 
for 1885-1914, 1914—1928, 1928-1952, and 1953-1990. These long series have 
small standard errors for $, so that the robust t-ratios are not close to zero; they 
equal —1.76, —5.27, —0.67, and —1.91. 

The persistence estimates given in Table 9.3 for the S&P 100 index in the 
more recent years from 1991 to 2000 are not so near to unity. These GJR(1, 1) 
persistence estimates d are respectively 0.9867, 0.9913, and 0.9902 for the normal, 
t- and GED conditional distributions. Computation of the standard errors of these 
estimates requires either the 3 x 3 covariance matrix of (@, G^, B )’ or the variance 
of d for a reparametrization of the model (say with $ replacing 6). The two 
methods will provide slightly different answers. The 3 x 3 matrix is used here and 
then the robust f-ratio is —2.48 for the normal specification, and —2.10 and —2.30 
for the t- and GED specifications. All these t-ratios reject a unit root at the 5% 
significance level. Blair et al. (2002) have a much lower persistence estimate for 
the same index and model from 1983 to 1992, that is highly sensitive to the crash 
return on 19 October 1987. They report d — 0.9289 for the GJR(1, 1) model, but 
d — 0.9755 when a crash dummy variable is included in the conditional variance 
equation. The median estimate of H is 0.9732 for the constituent stocks of the 
index when the dummy is included. A few of these estimates exceed 1. There 
is no correlation between d and the size of the S&P 100 firms. The impact of 
crashes on persistence was also noted by Friedman and Laibson (1989) in a study 
of quarterly returns. 

A second possible statement of the alternative hypothesis is a stationary, long 
memory process for volatility. In the notation of Section 10.3, the null and 
alternative hypotheses are then d = 1 and d < 1 respectively. There are few 
examples of this hypothesis test for ARCH models. Bollerslev and Mikkelsen 
(1996) perform the test for the S&P 500 index from 1953 to 1990, and from the 
FIEGARCH(1, d, 1) model obtain 


t = (d — D/s.e.(d) = (0.633 — 1)/0.063 = —5.83. 
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This is very strong evidence against the unit root hypothesis. It contrasts with £ = 
— 1.91 when the volatility alternative for the same series is the best short memory 
EGARCH process of Bollerslev et al. (1994). As the fractionally integrated (FT) 
model has a much higher likelihood than the simpler EGARCH specification, it is 
natural to prefer the FI test result. The difference between the test values (—5.83 
against —1.91) is presumably a consequence of the long memory alternative 
defining a more powerful test of the unit root hypothesis on this occasion. 

The long memory test is evidently important and it merits more research. Further 
series should be tested and the reliability of the Wald test for a unit root in a 
FIEGARCH model should be assessed. 


10.6 Model Building 


The choice of ARCH model in an empirical study will depend on many factors, 
such as the purpose of the modeling exercise, the expertise of the researcher, the 
time that can be devoted to the exercise, and the available software and data. Con- 
ditional variances and volatility forecasts that are fairly accurate can be obtained 
from simple models. Option prices that avoid the assumption of constant volatility 
can also be obtained from simple models (see Chapter 14). Models with more 
parameters and/or a more complicated mathematical structure are necessary when 
attempting to describe the stochastic process followed by prices as accurately as 
possible. The additional effort required to understand and estimate a more detailed 
model can provide more incisive tests of interesting hypotheses. For example, 
including long memory possibilities in the modeling framework can provide dif- 
ferent and relevant evidence about the hypothesis of a unit root in volatility, as 
noted in Section 10.5. 

A model can be selected either from a fixed set of candidate models or by fol- 
lowing a sequential procedure. Models are compared using the maximum values 
of the log-likelihood function, by testing special cases against more general possi- 
bilities and by evaluating diagnostic tests. Out-of-sample predictions can also be 
used to select a model, particularly when forecasting is the goal of the modeling 
exercise. 

Methods developed initially for models that have i.i.d. residuals are often 
applied in the ARCH context even if their exact properties are then unknown. For 
example, a quick way to select a model from a fixed set is to optimize the infor- 
mation criterion of either Akaike (1974) or Schwarz (1978), respectively denoted 
AIC and SIC. For ARCH models this requires maximizing either 2 log L(6) — 2P 
for AIC or 2log L(@) — P log(n) for SIC, with Ó the MLE for a model that has 
P parameters estimated from n observations. As the SIC criterion consistently 
estimates the orders p and q of a GARCH(p, q) model, SIC may be preferred 
to AIC. Bollerslev and Mikkelsen (1996) show that these criteria usually find the 
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correct model in their Monte Carlo study of GARCH(1, 1), IGARCH(1, 1), and 
the fractionally integrated extension of GARCH(1, 1). 


10.6.1 Diagnostic Checks 


ARCH models are constructed from an i.i.d. sequence of standardized residuals 
zt = (ri — Mr) /Vh;. From returns data (r;) and an estimated model we can cal- 
culate the estimated standardized residuals 2;, by replacing the parameter 0 in the 
equations for u+ and h; by the MLE 6. These terms 2, will almost be observations 
from an i.i.d. sequence when the model is correctly specified; they will not be 
exactly i.i.d. because 6 Æ 0. As the autocorrelations of either |r;| or r2 are the 
most direct evidence for conditional heteroskedasticity, it is logical to examine 
the autocorrelations of |Z,| and/or 2 to see if there is any predictability in the 
terms 2;. 

The most popular diagnostic test statistic is a Q-statistic, which is defined in the 
same way as the Box and Pierce (1970) statistic computed from the residuals of 
a homoskedastic ARMA model. For autocorrelations R, |; and R, :2 calculated 
from n observations of |2;| and 22, 


k 
Qi =n} R? (10.39) 
t=1 
and 
k 
OP SHS Ra (10.40) 
t=1 


The statistic Or can also be calculated from the autocorrelations of the 2; , although 
large values are unlikely because returns are almost uncorrelated. 

The asymptotic distributions of Qo? and o? are often assumed to be chi- 
squared distributions, by analogy with the theory for homoskedastic processes. 
Li and Mak (1994) show the analogy is imperfect. They prove that comparing 
oP with e is a conservative test procedure and they present an alternative 
quadratic form of the autocorrelations R, ; whose asymptotic distribution is 
Xi. The popular and simpler method is to follow the Box-Pierce approach and 
compare Qo? and/or OF against cos with m counting a relevant number of 
estimated parameters, assuming k > m. The theoretical results of McLeod and Li 
(1983) suggest the estimated parameters in the conditional mean equation can be 
ignored and thus m can be the number of estimated parameters in the conditional 
variance equation. The Monte Carlo evidence of Bollerslev and Mikkelsen (1996) 
shows that this ad hoc adjustment can be recommended when the fraction m/k is 
small. 

The value of k is rather arbitrary. When k = 20, the statistics Or, ot, and 
Q\ for the DM/$ data and the GARCH(1, 1) model estimated in Section 9.4 
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are respectively 27.17, 35.94, and 14.74. Then Q3, exceeds the 596 critical value 
of 27.59 from Xo» although a simple explanation of this result is not available 
by inspecting the autocorrelations of the quantities |2;|. For the S&P 100 index 
data and the GJR(1, 1)-MA(1)-M model estimated in Section 9.8, the statistics 
Q20, Di and Oo are respectively 27.22, 15.24, and 13.49. None of these values 
suggests the model is unsatisfactory, although the results in Taylor (2002) indicate 
that a long memory ARCH model is superior for S&P 100 index returns. 

An alternative to the portmanteau test procedure described above is to assess 
moment conditions one at a time, as in Nelson (1991) and Bollerslev et al. (1994). 
There are several relevant functions f for which 


E[f(z)] 2 0 (10.41) 


when the model is correctly specified, including z;, d — 1, 22425 Dla ey — 1), 
and (z2 — 1)(z2 4r — 1) with v positive. An elementary test for a correct speci- 
fication is given by comparing ./nx/s against the standard normal distribution, 
with x and s the average and the standard deviation of x, = f (Z;), providing the 
variance of f (z;) is finite. This test procedure ignores the impact of parameter 
estimation error and hence is only approximately valid; estimation error is likely 
to be a serious issue when f (zr) is either z; or ze — 1. Diagnostic tests about the 
distribution of the z; can also be performed. Symmetry can be assessed by set- 
ting f (z+) to either z or z;|z;| and particular distributions can be assessed using 
f Gi) = |zil — EIS, ete. 

Lagrange multiplier (LM) tests provide diagnostic information for particular 
alternatives to a candidate model and avoid the effort of maximizing the likelihood 
function for the alternative model. Robust LM tests are described by Bollerslev 
and Wooldridge (1992). LM tests and theory are also covered in Engle and Ng 
(1993), Bollerslev et al. (1994), and Franses and van Dijk (2000). 

ARCH models are usually estimated from prices recorded during several years. 
Itisthen possible thatthe parameters ofthe most appropriate model vary within the 
time period considered. Splitting the dataset into two sections and then comparing 
parameter estimates across the two subperiods provides some information about 
the constancy of parameters. The sum of the maximized log-likelihoods for the 
subperiods minus the maximum for the complete dataset can be used to find a 
likelihood-ratio statistic and hence to assess the hypothesis of constant parameters. 


10.6.2 Foreign Exchange Example 


A model for DM/$ volatility is selected in Taylor (19942) after comparing plausi- 
ble specifications, testing various hypotheses, and obtaining diagnostic informa- 
tion. The data are daily returns from futures contracts traded in Chicago, defined 
as changes in the price logarithm. There are 3283 returns in the complete dataset, 
from December 1977 to November 1990. 
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The conditional mean depends on the day of the week and is defined by five 
dummy variables, one for each day of the week. The conditional variance is 
higher for returns that include either a weekend or vacation period. Let h* be 
the appropriate conditional variance if period t defines a 24-hour return. Then 
suppose the trading period conditional variance h; is the following multiple of 

2 1] if close ¢ is 24 hours after close t — 1, 
Pa = 1M ift falls on a Monday and r — 1 on a Friday, 
t 


V ifa vacation occurs between close f and close t — 1, 


with M and V parameters. A standard specification for ^7 then completes the 
model. This is defined by writing one of the standard specifications for h; as a 
function of previous terms h;_; and z;_;, i > 0, followed by replacing all terms 
h;.j by LT j 2 0. See equation (9.39) for the GARCH(1, 1) case, while for 
the EGARCH(1) case replace terms h;_; by Dur in (10.1). 

The initial comparisons of maximized log-likelihoods are made between the 
GARCH(I, 1) model (parameters w, a, 8) and the symmetric EGARCH(1) model 
(parameters Lioetbt, Y, A) and between conditional normal, t- and GED distribu- 
tions. The normal models all have ten parameters, three in the conditional variance 
equation and seven calendar parameters; the nonnormal models have an additional 
shape parameter. The GED is superior to the f-distribution for this dataset. The 
maximum log-likelihood is 10.5 more for GARCH-GED than for GARCH-t and 
it is 9.3 more for EGARCH(1)-GED than for EGARCH(1)-t. The exponential 
specification fits the data more accurately than the GARCH specification. The 
log-likelihood advantage of the EGARCH(1) model is 5.0 and 6.2 respectively 
for GED and t-distributions. These comparisons and others for subperiods all 
favor the EGARCH(1)-GED specification, which is now called the benchmark 
model. Its parameter estimates for the complete dataset include tail-thickness 
fj = 1.32, persistence A = 0.9658, Monday multiplier M = 1.41, and vacation 
multiplier V = 1.69. 

The benchmark model is compared with alternatives by using the maximum 
likelihood for the alternative minus the maximum likelihood for the benchmark, 
denoted AL. The simpler alternative of conditional normal distributions has AL — 
— 13.271, hence the likelihood ratio test statistic is —2 AL = 146.54, which rejects 
the normal hypothesis when compared with Xi Another simpler alternative is a 
unit root in volatility, ^ — 1. The statistic —2 AL — 38.22 is against the unit 
root hypothesis, although it is preferable to draw this conclusion from the robust 
Wald test as documented in Section 10.5. The benchmark model has a symmetric 
response to volatility shocks and has 7 = 0 in the more general equation 


g(zi—1) = 9z—i-F y(Iz-il — Ellzi-ilp. (10.42) 
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The asymmetric model has AL = 0.55, which provides no evidence that the 
shocks are asymmetric. The use of two linear functions in the volatility shock 
function merely follows tradition. However, the smoother quadratic function, 


g1) = 9z1 t+ y (G2 ,— D 


fits the data worse, with AL = —10.5. Finally, EGARCH(p, q) models with more 
parameters than the benchmark do not provide a significantly better result. For 
example, when p — 2 and q — 1, the test statistic is 2 AL — 1.10, which is less 
than the median of the xi distribution. 

The benchmark model passes tests of the moment condition hypothesis, (10.41), 
for the functions z;, z? — 1, z3, ZtZt+r, and (z2 — DG. —]),1 & x & 10. 
A more demanding diagnostic test compares benchmark parameter estimates for 
1977 to 1983 with those for 1984 to 1990. Adding the maximized log-likelihoods 
forthe subperiods and then subtracting the maximum when the parameters are held 
constant throughout the sample period gives a test value equal to 2 AL — 33.08, 
which exceeds the 546 critical value of 19.68 in the right-hand tail of Js Tests 
on the individual parameters suggest four changes from the first to the second 
subperiod: a fall in the variance of volatility shocks, a rise in the median level of 
volatility, and changes to two of the calendar parameters. 


10.6.3 Equity Index Example 


A model for the volatility of the S&P 500 index is selected in Bollerslev and 
Mikkelsen (1996) by comparing the values of AIC and SIC across models, per- 
forming robust Wald tests and evaluating the portmanteau statistics Or, of, and 
D. The data are 9559 daily changes in the logarithm of the index, from January 
1953 to December 1990. 

The conditional mean is specified as AR(3), so that 


Ut = U + Erica d £r + &3r4-3. 


The estimates of Er, &, €3 are not sensitive to the specification of the conditional 
variance and equal 0.184, —0.057, and 0.021 for the preferred model, each with 
robust standard error equal to 0.011 approximately. The conditional variance 
incorporates a term N; that counts the number of nontrading days that are included 
in trading period r N; is O for most periods, equals 1 if the market closes for a 
one-day holiday on the day before day t, and equals 2 when day t is a Monday that 
follows trade on Friday. GARCH specifications of the conditional variance are far 
inferior to EGARCH specifications, judged by the AIC and SIC criteria, which is a 
consequence of the volatility shocks having an asymmetric impact. The EGARCH 
specifications that are discussed are all special cases of the FIEGARCHQ, d, 1) 
model including the nontrading variable, 


» ( g )- n (+ WL)g(z-1) 
AEFT ee ab OE a UE BE 


(10.43) 
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where g is the usual volatility shock function, (10.3), and the AR(2) term has been 
factorized as (1 — $1 L)(1 — $2L). The authors’ preferred model has 8—022. 

Model selection commences by supposing d = 0 and considering EGARCH(1). 
The QU -statistic rejects this specification, by finding serial correlation in the 
squares of its standardized residuals. Furthermore, the AIC and SIC criteria are 
much higher for EGARCH (2, 1) and indeed the likelihood ratio test easily rejects 
dn = y = 0 when d = 0. The related study of the same data by Bollerslev 
et al. (1994) finds the 2, 1 model maximizes SIC. The largest AR root of the 
EGARCH(2, 1) model is di — 0.997. This model is rated higher than the inte- 
grated special case by AIC and SIC, but a robust Wald test probably accepts the 
unit root hypothesis at the 5% level. 

The integrated EGARCH(Q, 1) model has ¢; = 1 and d = O, or equiva- 
lently dn = 0 and d = 1. It is a special case of the long memory model, 
FIEGARCH(1, d, 1), given by $5» = 0 in (10.43). This long memory model has 
d — 0.633 and the highest values of AIC and SIC for the set of models discussed. 
The values of Q1o and Q100 are 10.0 and 100.5 respectively, which provide no 
evidence against the specification of the conditional mean when compared with 
xi and xa. The value of Q1; is 15.1. There are seven estimated variance param- 
eters and a comparison against xs rejects the model at the 1% level, although 
the test procedure may be suspect when there are so few remaining degrees of 
freedom. For Qon the test value of 122.7 just rejects the conditional variance 
specification at the 5% level. 


10.7 Further Volatility Specifications 


The ARCH specifications already discussed necessarily exclude many other spec- 
ifications, because the ARCH literature is vast. We now mention further speci- 
fications that are relevant because they have the potential to answer interesting 
economic questions. 


10.7.1 Univariate Specifications 


The separation of a time series into temporary and permanent components allows 
the impact of transient and fundamental information to be identified. Engle and 
Lee (1999) suppose the conditional variance h; reverts rapidly towards a perma- 
nent component that is highly persistent. Then h; is the sum of two components, 
each of which is determined by past information and the latest standardized resid- 
ual z; 1. An empirical example is included in Engle and Mezrich (1995). The auto- 
correlations of squared returns for stationary components models decay rapidly at 
low lags and then relatively slowly at higher lags. Component models may there- 
fore be a satisfactory alternative to long memory models. However, Bollerslev 
and Mikkelsen (1996) prefer long memory to components in perhaps the only 
empirical comparison of these concepts in an ARCH context. 
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The information set /;_; used to define conditional distributions at time r has 
been restricted to the history of returns and calendar variables in the specific 
examples presented so far. Exogenous variables can also be included in the rele- 
vant information and in general we may have [;_) = (rjj, xi ij, i > 1) with x; 
either univariate or a vector variable. 

A simple example is given by a dummy variable x, that is 1 when t > T and 
O when t < T. Time T refers to an event such as listing options on a stock. 
An additional term 6x;—; can be added to the right side of any of the previously 
defined equations for h;. A test of the null hypothesis A = O then indicates 
whether or not the event has a permanent impact on the level of volatility. Taylor 
(1994c) evaluates the test when the event is listing options in the UK on Shell 
stock and finds no evidence for a permanent effect. Similar tests by St. Pierre 
(1998) and Jubinski and Tomljanovich (2003) for large samples of US firms show 
that volatility either decreases or remains constant after options are listed. The 
alternative methodology that compares the variance of returns before and after 
the event is unsatisfactory if options are listed when volatility is not at its mean 
level. 

A very important example occurs when the additional information x; is a mea- 
sure of asset volatility obtained from the prices of options on the asset. Several 
studies have shown that there is incremental information in option prices that 
can be used to provide more accurate predictions of subsequent volatility, com- 
mencing with Day and Lewis (1992). A logical adaptation of GARCH(I, 1), for 
example, is 


hy = (0 — BL) (o Let — DEI 4-0 — 5L) 1x4 (10.44) 


with L the lag operator and x;_; the square of an implied volatility. We defer our 
discussion of options information and ARCH models until Chapter 15, following 
the definition of implied volatility in Chapter 14. 

Information about trading volume could be used to define x; (e.g. Lamoureux 
and Lastrapes 1990, 1994). It is essential that lagged volume, and not contem- 
poraneous volume, is used to define h;. A measure of unexpected volume as a 
proportion of expected volume may be advisable, because aggregate volume is 
usually not stationary. Specifications like (10.44) can then be tried. 

The standardized residuals z; are independent and identically distributed vari- 
ables in the general ARCH framework described in Section 9.5 and in all the 
examples already described. The assumption of identical distributions can be 
relaxed by permitting the shape of the distribution to depend on the information 
I; 1. Hansen (1994) lets the degrees of freedom of a t-distribution be a function 
of I,—1. Dueker (1997), Harvey and Siddique (1999), Rockinger and Jondeau 
(2002), and Jondeau and Rockinger (2003) also relate the shape to the informa- 
tion. Such research is difficult because the single variable z, has to be used to 
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define the conditional mean, variance, and shape. Successful applications may 
enhance value-at-risk calculations, which depend in part on the shape of the left 
tail of the distribution of zz. 

Switching ARCH models also have standardized residuals that are not iden- 
tically distributed. The economy oscillates between unobservable states in these 
models. There are occasional changes in the state and then the parameters that 
determine the conditional variance also change. Two states could be used to dis- 
tinguish a normal market from a market experiencing a crisis or to distinguish a 
strong economy from a weak economy. An example of a switching ARCH model 
is 

hi = w(S;) +. a(riz1— wW)? + bhii (10.45) 
with {S;} a Markov chain process that is stochastically independent of the stan- 
dardized residuals. Estimation of the model provides the probabilities of the states 
at time f conditional on the information provided by observed returns (see Cai 
1994; Hamilton and Susmel 1994; Dueker 1997). Although these authors refer to 
“switching ARCH,” they could also locate their models in a stochastic volatility 
(SV) framework. Indeed, the special case of (10.45) that hasa = B = Oisa 
well-known SV model; its literature and properties are described in Section 11.4. 


10.7.2 Multivariate Specifications 


ARCH models for the conditional variances and covariances of the returns from 
two or more assets have many applications. They can be used to see how common 
information affects related assets, such as market indices in different countries or 
the spot and futures price of the same asset. They can also be used to compute 
time-varying risk factors and hedge ratios, from appropriate functions of covari- 
ances and variances. Bollerslev et al. (1994) and Engle and Kroner (1995) describe 
several multivariate specifications and a similar account is given in Franses and 
van Dijk (2000). Kroner and Ng (1998) and Bekaert and Wu (2000) cover mul- 
tivariate asymmetric models. Bollerslev et al. (1992) survey the early empirical 
research. 

For N assets, let £; now represent the N x 1 vector of residuals and let H; 
represent the N x N conditional covariance matrix, H; — E [ere} | L-1]. As H; 
is symmetric, there are IN (N + 1) terms that need to be defined using the infor- 
mation Lu. We now consider multivariate generalizations of the GARCH(I, 1) 
model. The most general model, which represents each element of H; as a lin- 
ear combination of all the elements of &j 18, 4 and all the elements of H;_1, 
requires IN (N+ 1) x A+ N(N + 1)) parameters and it is called a vec model 
(Engle and Kroner 1995). There are 21 parameters when N — 2. 

The large number of parameters in vec models and the requirement that the 
parameters define positive semi-definite covariance matrices motivates several 
special cases. One of these is the diagonal model of Bollerslev, Engle, and 
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Wooldridge (1988), for which element (i, j) of H; is simply a linear combination 
of elements (i, j) from Ere OL and H, ,. This reduces the number of covariance 
parameters to 3N (N + D), i.e. to 9 when N = 2. The additional simplification 
that the conditional correlations are time-invariant gives the constant correlation 
model of Bollerslev (1990), which has only IN (N + 5) parameters, i.e. 7 when 
N — 2. These simplifications have the drawback that, for example, the residual 
for asset 1 does not appear in the variance equation for asset 2 and hence some 
relevant information about asset 2 may not be used. A parsimonious compromise 
between the vec and diagonal representations, which avoids this drawback, is the 
BEKK model (Baba, Engle, Kraft, and Kroner 1991). It is defined by quadratic 
forms as 

H; = W + Ale 4A + B'H,—1B (10.46) 
with W, A, and B being N x N matrices. Matrix W is symmetric and positive semi- 
definite, which guarantees H; has the same properties. There are IN (SN + 1) 
parameters, i.e. 11 when N = 2. 

Recently, Engle (2002) has popularized simple multivariate models that have 
dynamic conditional correlations. Another modeling strategy commences by 
assuming expected returns are linear functions of factors and that the factor returns 
follow ARCH processes (see Engle, Ng, and Rothschild 1990; Ng, Engle, and 
Rothschild 1992; King, Sentana, and Wadhwani 1994). 


10.8 Concluding Remarks 


ARCH modeling has rapidly become a dominant paradigm when discrete-time 
models are used to describe the prices of financial assets. It is easy to obtain max- 
imum likelihood estimates of parameters and to compare alternative model spec- 
ifications. This explains why ARCH models are often preferred to other volatility 
models that can also explain the stylized facts for returns. 

The selection of a specific model for the returns from a financial asset involves 
choices that can be guided by estimating models for a sample of returns. A long 
memory model should be considered if the objective is to obtain a high likelihood 
value from a dataset by fitting a model that has few parameters. The FIEGARCH 
model described in Section 10.3 is a promising example. Relevant standard errors, 
hypothesis tests, and diagnostic criteria should always be evaluated, as surveyed in 
Sections 10.4—10.6. Successful new specifications may well emerge in the future. 

There is much more that can be written about ARCH models. A detailed descrip- 
tion of multivariate models is beyond the scope of this text and the interested reader 
should see Engle and Kroner (1995), Kroner and Ng (1998), and Engle (2002). 
ARCH models for intraday returns are important and are covered in Chapter 12. 
The continuous-time limits of discrete-time models are also important and they are 
documented in Chapter 13. Applications to option pricing, volatility forecasting 
and density estimation are evaluated in Chapter 14-16. 
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10.9 Appendix: Formulae for the Score Vector 


Suppose there are p parameters and that only the first m of these appear in the 
equations that define Gu: (0) and h;(@), with the remaining parameters (if any) 
defining the density function of the standardized residuals z; (0). Analytic standard 
errors can be calculated from the p x 1 score vector, s; (0) = 01;/90, if analytic 
formulae are available for Ou, /80 and dh;/00. The general formula for the first 
m terms in the score vector is 


$6) = əl, » a(zi)zi ( du: 4 a(z;)z2 —1/8h, 
TOS vh, \ 06; 2h, 96; 


with the function a(-) determined by the density function of the z,. When this 
density is normal, 


P 1€i «m, (1047) 


a —]1, 
when it is the standardized t with v degrees of freedom, 


vcl 
v-2ctz 


and when it is the generalized error distribution with tail-thickness parameter n, 


gief? — 

e See f 0, 
gud mur ee 

0, when z; = 0, 


with A.(-) defined by (9.46). The final term in the score vector for the standardized 
t is 
al, " dlogc(v) 
Au du 


1 2 2 
(v + Dz; "rem Zs 
2x, (v — 2 


5 log(x;) + 
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and for the GED it equals 
dl, dlogC(p | E 
= = (lo — log A ! 
On E 5 (log |z| — log EY T 


with c(-) and C(-) defined by (9.44) and (9.46). 


Recursive formulae for 
Op d 9h; 


and  — 
00 00 
can be written down for the GARCH model and many of its extensions. For 
example, for the GJR(1, 1)-MA(1)-M model defined and estimated in Section 9.8, 
with residuals e,_; and conditional normal distributions, m = p = 7, 


0 = (u, à, ©, a, a, a7, BY, 


p "NE X ðh: (10.48) 
AAT E 1, ht, 1, 0,0, 0,0 e , 
aa ev Aes e ) 00 | 2Sh, 90 
and 
ah 
ES = (0,0,0, 1, e? ,, S182 4, hi i 


due 9h, 
! 10.4 
00 SR 00 Kl 


When an additional parameter defines the density of the standardized residuals, 
equations (10.48) and (10.49) define the first seven terms of the vectors and the 
eighth terms are zero. 


— Ztoto S; eria 
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Stochastic Volatility Models 


Stochastic processes for volatility and hence returns are defined and investigated 
in this chapter. These models have a simple structure and can explain the major 
stylized facts for asset returns. Their parameters can be estimated in many ways, 
although the most efficient methods are rather complicated. 


11.1 Introduction 


Volatility changes are so frequent that it is appropriate to model volatility by a 
random variable. We now do this in a discrete-time framework, although it is also 
instructive to consider continuous-time models when pricing options as we will 
see later in Chapter 14. Volatility cannot be observed directly from discrete-time 
returns data because it is a latent variable that is not traded. It can, however, 
be estimated fairly accurately from high-frequency data, as is shown in the next 
chapter. All estimates are imperfect and we have to interpret volatility as a latent 
variable that can be modeled and predicted through its direct influence on the 
magnitude of returns. 

Stochastic volatility (SV) models involve specifying a stochastic process for 
volatility. They therefore differ from ARCH models that specify a process for 
the conditional variance of returns. The SV literature has its origins in Rosenberg 
(1972), Clark (1973), Taylor (1982b), and Tauchen and Pitts (1983) and has grown 
less rapidly than the comparable ARCH literature that developed from Engle 
(1982) and Bollerslev (1986). The reason for the greater popularity of ARCH 
models is quite simply that maximum likelihood estimation is easy for ARCH but 
difficult for SV models. Nevertheless, SV models arise naturally when pricing 
options in a world of changing volatility. SV and ARCH models explain the same 
stylized facts and have many similarities. Each family of models has constructive 
applications, so that efforts to solve the difficult problem of deciding which is 
best are probably misguided. 

Shephard (1996) provides an excellent introductory survey of SV and ARCH 
models. He remarks that the properties of SV models are easier to find, under- 
stand, manipulate, and generalize to the multivariate case. Subsequent develop- 
ments in the SV literature have largely concentrated on methods for estimating 
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model parameters and the unobservable volatility variable. A variety of ingenious 
algorithms are reviewed in Sections 11.6 and 11.9. Ghysels, Harvey, and Renault 
(1996) provide a more mathematical survey of SV models, which covers both 
discrete and continuous-time formulations. They also include a detailed discus- 
sion of option prices motivated by the SV framework. Shephard (2005) is a recent 
collection of important SV papers, which contains a review of the SV literature. 

Further motivation for SV models and a general definition are provided in 
Section 11.2. Excess returns are defined to be the product of volatility and an 
i.i.d. standardized variable. Mathematical analysis is much easier when the two 
variables in the product are stochastically independent. General results for this 
assumption are given in Section 11.3 and applied to very different volatility pro- 
cesses in Sections 11.4 and 11.5. The first volatility process is a finite-state Markov 
chain while the second is a Gaussian AR(1) process for the logarithm of volatility. 
The Gaussian specification defines what we call the standard SV model because 
it appears in the most SV studies, commencing with Taylor (1982b). There is a 
variety of methods, discussed in Section 11.6, for estimating the parameters of 
the standard model. A foreign exchange example is provided in Section 11.7 for 
a straightforward estimation method that applies the Kalman filter. 

The assumptions of the standard SV model are relaxed in Sections 11.8 and 
11.9. First we permit heavier tails in the distribution of returns than are given by 
conditional normal distributions. Then we introduce asymmetric effects into the 
volatility process by relaxing the assumption of stochastic independence between 
the volatility and standardized processes. The chapter continues with a description 
of long memory specifications in Section 11.10 and multivariate SV models in 
Section 11.11, followed by some notes on comparing and combining SV and 
ARCH models in Section 11.12. 


11.2 Motivation and Definitions 


Stochastic volatility models suppose the volatility on day t, denoted by o;, is 
partially determined by unpredictable events on the same day. Volatility is pro- 
portional to the square root of the number of news items in the information arrivals 
model of Section 8.3. There is then an unpredictable component in o; as invariably 
some news is not scheduled (Taylor 1986). Volatility in this framework will be 
autocorrelated whenever the news counts are autocorrelated and then the stochas- 
tic properties of {o;} merit investigation. 

Related motivation comes from the concept of time deformation, for which the 
trading clock runs at different rates on different days with the clock represented 
by either transaction counts or trading volume (Clark 1973; Ghysels et al. 1998; 
Ané and Geman 2000). Shocks to volume then create an unpredictable volatility 
component. 
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A third source of motivation comes from approximations to diffusion processes 
for a continuous-time volatility variable. Volatility diffusion processes are plau- 
sible when deriving option pricing formulae without the assumption of constant 
volatility. Some of these formulae can only be evaluated by simulation of discrete- 
time volatility processes (Hull and White 1987). 

The above remarks motivate study of stochastic volatility (SV) models, which 
we require to have two properties. First, returns in excess of a constant mean ju 
can be factorized as 

T; — U = Otit (11.1) 


with o; positive. Second, there is an unpredictable component in volatility, i.e. 
var(o; | ri-1, ri-2, ...) > 0. The random variables u, are assumed to be indepen- 
dent and identically distributed (1.1.d.) with zero mean and unit variance. Often, 
it is also assumed that the u, are normally distributed, otherwise the factorization 
may not be unique, as noted below. We consider various stochastic processes for 
{or} in subsequent sections. We could replace the mean y in (11.1) by a function 
of previous returns if we wished to incorporate autocorrelation among returns 
into the model, although this is not done here. 

SV models are characterized by two random shocks per unit time, one of which 
is ue. The other shock, say n;, partially determines o+. For example, n; may be the 
residual in an ARMA model for some function of o;. As there are twice as many 
shocks as observed returns it is impossible to deduce the realized values of o; 
and u; from returns r;. Volatility is then latent and unobservable, which certainly 
complicates the estimation of model parameters. 

Analysis of the factorization (11.1) requires some assumptions about the rela- 
tionship between the stochastic processes {o+} and {ur}. We call the SV model 
independent if these two processes are stochastically independent. Independent 
SV models are considered in Sections 11.3-11.8 and then we consider models 
that allow some dependence between the processes. General dependent processes 
are defined in Ghysels et al. (1996) with an assumption that each process does 
not Granger-cause the other process. 

The general ARCH model for uncorrelated returns also factorizes excess re- 
turns, but as 

n—u-h"?g (11.2) 


with z; ~ Ltd. D(O, 1) and the conditional variance h; a function of information 
known before time f. It does not follow that ARCH models are SV models, by 
substituting o? = Dr, because there is no unpredictable volatility component in 
hy, since var(h; | I;-1) = 0. However, when zx is a mixture of normal dis- 
tributions, we can write z; — my uy with E[m;] = 1, var(m;) > O0, and 
m, independent of both u, ^ NO, 1) and J;_1, as noted in Section 9.6. Then 
o; = (hum)? makes (11.1) and (11.2) equivalent, with E[o? | 1,1] = ^; and 
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var(o? | 4-1) = h? var(m;) > 0. Thus ARCH models with appropriate fat-tailed 
conditional distributions are SV models in our terminology. 

The typical conditional variance function h; is a very intricate function of /;. 
for SV models, with h; Æ box The standardized residuals z; = (r; — m) [A hr 
always have zero mean and unit variance, but z; Z u;. Generally the z; are not 
1.1.d., which is shown for certain independent SV models in Section 11.4. It follows 
that there are SV models which are not ARCH models, since the z; are i.i.d. for 
ARCH models. 


11.3 Moments of Independent SV Processes 


Now suppose (o;) is strictly stationary and stochastically independent of (u;). The 
independence assumption allows us to derive general formulae for the moments 
of r; — u = orur. Any expectation E[ fi (0t, 0; 1, ...) fo(ur, Ut—1, ...)] is simply 
the product of E[ fı] and E[ f2]. The assumption that the u; are i.i.d. with zero 
mean and unit variance will help to simplify E[f5]. The mean, variance, and 
kurtosis of returns are 
2 ; Elo] 
E[r] =u, var(r;) = E[of]. and kurtosis(r;) = k, = MIT IY 
[oz] 
(11.3) 
with k, the kurtosis of u;, which is 3 for normal distributions. The formula for k, 
is given by adapting equation (8.6). These expressions and those that follow are 
only defined when all the relevant moments are finite. For example, the kurtosis 
of returns is finite if and only if both k, and E [o7 ] are finite. 
The returns process is uncorrelated, from (8.7). Furthermore, the excess returns, 
rı — u, are a martingale difference process. The squares of the excess returns 
st = (ru — Ey = ofu? 
have the same covariances as the squares of volatility, i.e. 
COV(S;, Sper) = cov(a?, of), T0, 


from (8.8). The autocorrelations ofthe s; are thus a multiple of the autocorrelations 
of the o2, respectively denoted by pr,s and p, 52, as follows: 


8 var(o2) 
eec | var(s;) 


kan to. 


Then 
Els) — Elo? kr — ku 
Que b = Hs [Poo pcm 
whenever the kurtosis of returns is finite. High persistence in the volatility process 


will cause ou, to be almost (k, — k,)/(k,(k, — 1)), which is bounded above by 
1/ku. 


Prot T>0, (114) 
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Similar results can be derived for any positive power p of the absolute excess 

returns, 
a; = |ri — u| = ole), 

Let 
_ Elo;"] Ein nt 
— Elo? ?2 — Eine 
Then by modifying (8.8) and (11.4) we can obtain the autocorrelations of ap as 
a multiple of those of o7"; thus, 


A(p) and B(p) (11.5) 


Prap = C(p)0uor, t0, 


with 
(= E 
A(p)B(p) —1 Bo 
Note that A(p) and C(p) are functions of p and the parameters of the process 
{or}. Taylor (1986) derives (11.6) when p is either 1 or 2. 
For normally distributed u;, 


(11.6) 


E[|u;|?] = 2725-1? r((p + 1)/2) (11.7) 


and hence B(p) can be evaluated. In particular, E[|u;|] = 42/7, E[|u;?] = 
2 J/2/1, B(1) = 1/2, and B(2) 23,as l'(1 = FQ) = I. 


11.4 Markov Chain Models for Volatility 


A discrete probability distribution that has only two possible outcomes defines 
the simplest nontrivial distribution for volatility. A two-state Markov chain then 
defines the simplest stochastic process for volatility. This volatility model can 
explain the major stylized facts for returns and it is constructive in two ways. First, 
it provides some intuition for models and likelihood methods when volatility has 
a continuous distribution. Second, it illustrates some of the differences between 
ARCH and SV models. The two-state Markov chain model is quite popular in 
research literature, although it describes returns less successfully than models that 
have a continuous distribution for volatility. We present a detailed collection of 
results when volatility has two states and then illustrate parameter estimation for 
DM/$ returns. Finally, we mention extensions that consider three or more states. 

A two-state model for volatility is derived in Ball and Torous (1983) by sup- 
posing prices contain occasional jumps. A two-state Markov chain for volatility is 
evaluated in Hamilton (1988) and Pagan and Schwert (1990). Further theoretical 
and empirical results are provided by Pagan (1996) and Taylor (1999). 

Models with several volatility states are investigated by Ryden et al. (1998) and 
the moments of these and more complex models are described by Timmermann 
(2000). The model for returns is often called a hidden Markov model (HMM), 
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as the Markov variable is unobservable volatility. Many theoretical results for 
HMMs are provided by Lindgren (1978) and Hamilton (1994, Chapter 22). 


11.4.1 Definition of the Two-State Model 


The volatility o; for period t has distribution given by 


oL With probability p, 
or = . Ge (11.8) 
oy with probability 1 — p, 
with the subscripts “L” and “H” referring to low and high volatility states, so oj, < 
og. Volatility oscillates between the two states as time passes. The probability of a 
change in the state only depends on the latest state when {o;} is a Markov process. 
The probabilities of up and down changes are respectively denoted by 


PLH = P(o; —0u|o;—1— oy) and pp = P(o, = oy | o:—1 = og). 
(11.9) 
The expected numbers of up and down changes are equal, as the volatility process 
is stationary, and hence the constraint 


p x pug = (d — p) x pu. (11.10) 


applies to the change probabilities. 
The return for period f is 
rt = U + Ott, (11.11) 


with u a constant and with the u; independent and identically distributed as 
N (0, 1). The model for returns has five parameters, u, oL, og, p, and either Du 
or pur. The processes {0+} and {u;} are assumed to be stochastically indepen- 
dent. Returns have distribution N (u, og) when volatility is low and distribution 
NI, og) when volatility is high. Their unconditional variance is 


o? = var(r;) = Eloi = pog +(1- pog (11.12) 
and their unconditional density is a mixture of normal densities, 


fr) = pii | u, o£) +0 — pt: | u, og), (11.13) 


with w(r | u, X?) here representing the density of N (u, X?). The symmetric 
density f (r+) is leptokurtic because it is a normal mixture. The kurtosis of returns 
is 

3(po + (1 — p)og) 


k = kurtosis(r;) = 7 
o 


(11.14) 


from equation (8.6). An appropriate density for daily returns is obtained when 
p = 3/4, od = 07/2, and of, = 5o? /2, from Taylor (1999). The kurtosis then 
equals 5.25. 
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11.4.2 Autocorrelations 


The returns process satisfies all the assumptions made in Section 11.3. Conse- 
quently, the returns are uncorrelated and the autocorrelations of squared excess 
returns can be obtained directly from those of squared volatility. 

Results for the variables o7 can be derived easily from results for the variables 
(o? — og) / (od — og), whose outcomes are either O or 1. In particular, 


E[o? | ds 875; E .] = o? + e! — pLH — puro , E SE 


Then we can deduce 


of =0° + (0 - pin — puo? , — 0°)  & (11.15) 
for variables &, that are white noise. This follows from the Wold decomposition 
of a stationary process, although the residuals &; are not i.i.d. (Pagan 1996). Thus 
(o2) is an AR(1) process, with autoregressive parameter 


$ = l — pin — pur (11.16) 


and autocorrelations 
T 
Pro2 =, T>0. 


Hence, from (11.4), the autocorrelations of the squared excess returns, s; = (r; — 
VI. are 

k —3 
EIER 
with the kurtosis of returns k given by (11.14). We may also note that (o?) is an 
AR(1) process with AR parameter 6 for all positive A and hence the autocorrela- 
tions of s2 /? can be derived from (11.6). 

The persistence parameter d must be almost | for daily returns, as already seen 
in Chapters 9 and 10. Therefore both of the change probabilities, pu and pt. 
must be small because their sum is 1 — $. Consequently, changes of state must be 
rare if the model is applied to daily returns (Pagan 1996). For the illustrative values 
$ = 0.98 and p = 3/4, equation (11.10) gives py = 0.005 and pp = 0.015. 
The expected time until an up change is then 1/pryg = 200 time units when the 
volatility is low, which is about ten months for daily returns. Also supposing that 
on = 587. k — 5.25, and the autocorrelations of the squared excess returns decay 
geometrically from p1,s = 0.173. 


bes $, t>0, (11.17) 


11.4.3 Conditional State Probabilities 


It is impossible to identify the volatility state at any time if the only data we have 
are provided by returns. Instead, we have to settle for conditional probabilities. 
Let H be a set of n returns, (r1, ..., rn}, let J; = (r1, ..., ri) be the usual history 
up to and including time f, and let J; = (r;, ..., rn} be the current and future 
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values at time t. Then conditional probabilities for the low state, P (oL | A), can 
be obtained when A is any of H, I;—1, Ir, Jt, Ji; 1, and ;_1+ J;+1, from equations 
in Lindgren (1978), Hamilton (1988), and Taylor (1999). 

We focus on the conditional state probabilities, given prior returns, denoted 


pr Pio =oL|fr-1) and qi = P(o, = og | l-1). (11.18) 


The derivation supposes we already have p;—; and q;—; from J;_2. These prior 
probabilities for ou can be updated by Bayes’ theorem when ru becomes 
available, to give the posterior probabilities 


Di = P(o-1 = oL | Ili-1) 
pi-iV Gua | s oi) 


= (11.19) 
pi-i i-i | u, of) + qi ir Cri-a | H, OF) 


and qž_; = 1 — p? ,, with y (-) again the normal density function. The next pair 
of prior probabilities follow from the transition probabilities of the Markov chain. 
In particular, 

Pt = p; a0 — pua) + q; 4 PHL- 


Combining the two previous equations provides the recursive formula, 


` piv Gia | u, ech) — pus) + iir Gia | H, og) PHL 
pii Gui | ps og) + qi-iV Gia |, og) 
which commences with pı = p. 

These probabilities cluster around 0 and 1 for daily returns data, because volatil- 
ity rarely changes state. Long sequences of returns from the same state give a high 
probability of correctly identifying the state on most days. When future returns 
are also used, 85% of the conditional probabilities are either below 0.01 or above 
0.99 for realistic parameters (Taylor 1999). 


. (11.20) 


Pt 


11.4.4 Conditional Return Distributions 


The conditional density of r; given the previous returns J,;_; is a normal mixture, 


fi | hi) = pilis | is op) + = pov Gi | us og). (11.21) 
which is almost normal when p; is almost 0 or 1. The conditional variance is 
simply 

h; = prop + (1 — prog. (11.22) 
Clearly, A; is neither SÉ nor SE which is a general feature of stochastic volatility 


models. As h; is a linear function of p+, it follows from (11.20) that ^; is an intricate 
nonlinear function of ht—1 and (rj. — ny. The standardized residual at time t is 


EEN of | 
fe = s 
Vh. prop + 0 — prog 
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Table 11.1. Formulae used in the two-state volatility spreadsheet. 


Cell Formula 


B12 =1/SQRT(2*PI()) 

D19 = =((D18*G18* (1-$H$9) )+C(1-D18) *H18* $H$10))/ 
((D18*G18)+((1-D18) *H18)) 

E18 =D18*$H$4*$H$4+(1-D18) *$H$5*$H$5 

F18  -(C18-$H$3)/SQRT(E18) 

G18  -$B$12*EXP(-0.5*(((C18-$H$3)/$H$4)^2)) /$H$4 

H9 . z(1-H6)*(1-H7) 

H10 -H6*(1-H7) 

H13  =SUM(I18:12607) 

H18  -$B$12*EXP(-0.5*(((C18-$H$3)/$H$5)^2))/$H$5 

I18 =LN(D18*G18+(1-D18) *H18) 


The variables {z;} have zero mean, unit variance, and are uncorrelated. They are 
not i.i.d., however, as their conditional kurtosis is a function of past values of the 
process: 
3(pioq. + (1 — phog) 
h? 

Consequently, the two-state volatility model is not an ARCH model according to 
the definition we have adopted in Section 9.5. 

Conditional variances for any future period, i.e. var(7;+n | L), n > 1, are given 
by the same methods as in Section 9.3 and equal 


kurtosis(z; | ;-1) = kurtosis(r; | Z,1.1) = 


c? + (1— pin — pu" (hii — 0°). 
11.4.5 Parameter Estimation 


The likelihood L of a set of returns is the product of the conditional densities, 
given by (11.21). It is easy to evaluate, in contrast to the situation when volatility 
has a continuous distribution. The function L(ri, ...,r5 | A, OL, OH. PLH, PHL) 
is unbounded when u equals any of the values r;, because then L — oo as 
oL — 0. Consequently, the maximum likelihood estimate (MLE) of all five 
parameters does not exist. The MLE of the four volatility parameters can be 
obtained with the constraint that jz is the sample mean. This method provides 
consistent parameter estimates, although care is required to avoid local maxima 
of the likelihood function (Ryden et al. 1998). 


11.4.6 Currency Examples 


The MLE for the ten years of daily DM/$ returns from 1991 to 2000 is given by 
oL = 0.00496, oq = 0.00912, p = 0.646, and œ = 0.924, with variance ratio 
eg Jof = 3.38 and transition probabilities on = 0.0270 and pu = 0.0493. 
The maximum log-likelihood of 9410.04 exceeds the maximum of 9396.06 for 
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Exhibit 11.1. 
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Figure 11.1. DM/$ volatility from the two-state model. 


the GARCH(1, 1) model estimated from the same data in Section 9.4. However, 
the Markov chain model has one more parameter than the GARCH model and 
therefore it is inappropriate to use the higher likelihood as evidence that ARCH 
models are inferior. The persistence estimates are rather different for the Markov 
chain and GARCH models, being respectively 0.924 and 0.991. The estimate 
of $ has a high standard error and a conventional 95% confidence interval for 
d includes 0.96. A low persistence level for the Markov chain model is also 
noted in Taylor (1999), where ten years of daily £/$ returns from 1982 to 1991 
provide the estimates ogof = 3.73, p = 0.726, $ = 0.880, pn = 0.0329, and 
pur = 0.0871. 

Exhibit 11.1 shows part of an Excel spreadsheet that was used to find the MLE 
for the DM/$ series, with a selection of the cell formulae provided in Table 11.1. 
The log-likelihood, log L (oL, oy, p, 9), is maximized with the constraints oL > 
0.0001, og 2 oL, and 1 > p, ġ > 0. 

A time series of volatility estimates, given by the annualized conditional stan- 
dard deviations 6, = 4/259h,, is shown in Figure 11.1. These estimates come 
from (11.22) and their range, from 8.5% to 14.4%, reflects a range for the con- 
ditional probabilities p; from 0.049 to 0.943. Figure 11.1 can be compared with 
the equivalent Figure 9.1 for the GARCH(1, 1) model. There is a tendency for 
the conditional standard deviations of the Markov chain model to cluster around 
the extreme values oL and og, which contrasts with the unimodal distribution of 
6; for GARCH(I, 1). 


11.4.7 Extensions 


Markov chain models for volatility can be made more realistic by increasing 
the number of states. The conditional state probabilities can then be obtained 
by adapting equation (11.20) and hence it is feasible to obtain the MLE for the 
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volatility parameters. Ryden et al. (1998) show how the most appropriate number 
of states can be inferred by using likelihood methods. They find that a three-state 
model is preferable to a two-state model for several series of daily S&P 500 index 
returns. 

A Markov chain can be used to select an ARCH model instead of a volatility 
state. This idea defines the switching ARCH models, which we mentioned in 
Section 10.7. They are a hybrid of ARCH and SV specifications. In a similar 
manner, a Markov chain can be used to specify the parameters of a SV process; 
this is explained at the end of the next section. 


11.5 The Standard Stochastic Volatility Model 


The simplest credible continuous distribution for the stochastic volatility o; is log- 
normal when returns are observed daily or less often. Then log(o;) ~ N (o, p?), 
with o and P parameters. The lognormal distribution is the standard choice when 
a continuous distribution is used for volatility. This choice is pragmatic. It guar- 
antees positive outcomes for volatility (unlike a normal distribution), it permits 
calculation of moments and it allows any level of excess kurtosis in returns. It 
appears in Taylor (1980, 1982b) and Tauchen and Pitts (1983) and it is supported 
by the early empirical study of Clark (1973). The recent evidence from several 
studies of high-frequency returns provides strong empirical support for the lognor- 
mal distribution, several years after its adoption for daily volatility; this evidence 
is discussed in the next chapter. 

The autocorrelations of volatility are proportional to those of absolute excess 
returns for independent SV processes (equation (11.6)). This indicates that the 
autocorrelations of volatility must decrease slowly, because this occurs for sam- 
ples of absolute excess returns (Section 4.10). Therefore, the simplest appropriate 
stationary stochastic process for volatility is a Gaussian AR(1) process for its log- 
arithm, 

log(o;) — o = é(log(oi-1) — ei + m. (11.23) 
The parameter dh represents volatility persistence, with —1 < ¢@ < 1. The iid. 
volatility residuals n; have distribution N (0, 07), with o? = 8^(1— ¢°). 

The standard SV model of Taylor (1986) is given by (11.23), 


T; = H + 0j, (11.24) 


and two further assumptions. First the i.i.d. variables u; are distributed as N (0, 1) 
and second the processes {o;} and {u;} are stochastically independent. The returns 
process is strictly stationary, since it is the product of independent strictly station- 
ary processes. It is also covariance stationary, because the returns have finite 
variance, as shown later. 

The standard model has received more attention than any other SV specification. 
Its major properties are as follows. 
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* All the moments of returns are finite. 


The kurtosis of returns equals 3 exp(48?). 


The correlation between the returns r; and re is zero for all t > 0. 


The correlation between the squared excess return s; — (r; — n» and St+r 
is positive for all t > 0 when d is positive. The correlation approximately 
equals Có* with C a positive function of £. 


The autocorrelation function of a? = |r; — |? has approximately the same 
shape as that of s, for all positive p. 


More flexible models are obtained by allowing excess kurtosis in the u; and/or 
dependence between oy and us for some values of t’ — t, as discussed later in 
Sections 11.8 and 11.9. It should be noted that the standard SV model does not 
permit volatility to react asymmetrically to price falls and price rises, because o; 
is independent of the signs of all previous returns. It may also be noted that some 
writers avoid referring to o; and instead consider either h; = o or h; = log(o2), 
with h; being their notation for a function of o;. Such notation would be confusing 
if it were used here, as we reserve the notation A; for the conditional variance 
given by var(r; | rii, ri 3, ...). 

We now describe several properties of the standard SV model, including a state 
space representation, its moments, and its autocorrelations. We then provide an 
overview of the methods that can be used to estimate the parameter vector, which 
is 0 = (u,a, B, )', in Section 11.6. This is followed by a survey of typical 
parameter estimates. 


11.5.1 State Space Representation 
For additional variables defined by 
l,—-logür — AD. Lr = log(o;), and & = log(ļu:]), 
there exists a linear state space representation for the process (/;). The measure- 
ment equation is 
lı = Lic 

and the transition equation is 

Li = (0 —9)o + óLi—i + mr. (11.25) 
The logarithm of volatility is then the unobservable state variable and application 
of the Kalman filter provides information about its distribution conditional on 
Observed returns (Scott 1987; Nelson 1988; Harvey, Ruiz, and Shephard 1994). 
The state space model is not Gaussian because &; is not a normal variable. The 
distribution of E, has mean wz = —0.63518... and variance og Lg? /8 (Wishart 
1947; Scott 1987). It is skewed with a long left-hand tail. The process {/;} is an 


ARMA(1, 1) process, since it is the sum of an AR(1) process and an independent 
1.1.d. process. 


280 11. Stochastic Volatility Models 


11.5.2 Density and Moments 


The unconditional density function of returns is symmetric about its mean Uu. 
Called the LNN density in Section 4.8, it is given by integrating over the latent 
volatility variable, 


yos Í vis | us o2)A( | o, B2) do, 


with y(-) the normal density function and A(-) the lognormal density function 
defined by equation (3.4). This integral can only be evaluated numerically. Fig- 
ures 4.3 and 4.4 display an example of the density. 

For any positive number p, 


E[|ri — wl?) = Elo; ] E(|u;|?]. (11.26) 
As log(a;’) = plog(o;), the distribution of log(o/") is N (pa, p? 2?) and thus 
Elof] = exp(pa + 1p? ?). (11.27) 


All the moments given by (11.26) can be evaluated using (11.7) and (11.27). In 
particular, 
Eli, — ul] = dis expla + 38^), 
var(r;) = exp(2a + 28°), (11.28) 
kurtosis(r;) = 3 exp(4f?). 


11.5.3 Autocorrelations 


The excess returns r; — u are a martingale difference and are hence uncorrelated. 

The autocorrelations of /, = log(|r; — u|) can be derived easily from the state 
space equations (11.25). The variance of I; is B? + (x? /8), because the variances 
of L; and E are respectively p? and x? /8. The covariance of l, with l+: is 
the same as that of L; with Li+: when t > 0 and equals B? Q". Thus Il has 
autocorrelations 

px = C(0,B)9*, rz, 

with 

8p? 
These autocorrelations are all positive (assuming that $ is positive) and they decay 
geometrically at the rate @. 

The autocorrelations of a; = |r; — u| and s; = (r; — u)? have approximately 
the same shape, although they decay from different constants. The following 
approximations are good when f? is small and/or $* is near 1: 


Pra = cd, DP, Dee = C(2, DP, T> 0, 


C(0, B) = (11.29) 
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with 
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ex —1 exp(4 —1 
ca, gy = 6) and C(2, 6) = PAPO! — (1139) 


(1/2) exp(B*) — 1 3exp(48?) — 1° 
For the typical value 6 = 0.4, the constants CO, 8), C(1, 8), and CO, B) are 
respectively 0.115, 0.206, and 0.191. 

These geometric decay formulae have the same shape as the autocorrelations 
of both s; from a GARCH(1, 1) model and /; from an EGARCH(1) model (Taylor 
19942). The standard SV model has many similarities with GARCH(1, 1), yet it 
is more similar to the special case of EGARCH(1) given by symmetric reactions 
to price falls and rises. This remark is made more precise in Section 13.4 when 
we consider the diffusion limits of the SV, GARCH, and EGARCH models when 
observations are obtained more and more frequently. 

The above approximations extend to the autocorrelations of a? for any positive 
value of p. From (11.5), (11.7), and (11.27), A(p, B) = exp(p? 2), 


Bp = /aT(p* DFGpt 9^. (11.31) 
exp(p? f?) — 1 
B(p)exp(p?g?) — 1 

and there is the approximation 


Prar = C(p, B)b", v » 0. (11.33) 


As p — 0, C(p, B) converges to C(0, B) given in (11.29). 

Figure 11.2 shows C(p, £) as a function of p, when B = 0.2, 0.4, 0.6, and 
0.8. When f is held constant, it can be seen that C increases monotonically as p 
increases until a maximum value is attained and then C decreases monotonically 
as p continues to increase. The dependence within the process (a7) is maximized 


C(p, B) = p> 9, (11.32) 
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when the term C is maximized. These maxima occur at p — 1.70, 1.27, 0.97, and 
0.75 as £ increases from 0.2 to 0.8. 

The exact formulae for the autocorrelations of a can be obtained from (11.6) 
once we know the autocorrelations of o]? As log(o7" ) is a Gaussian AR(1) process, 
the distribution of log(o/^) + log(o7 3 is N(2pa, 2(1 + $!) p? 8?) and hence 


Elo? op ,] = exp(2pa + (1+ 9)p?8), c2 0. (11.34) 
It then follows that the autocorrelations of ol are 
exp(p?f?$7) — 1 
exp(p?82) —1 ' 
as previously shown by Granger and Newbold (1976). The autocorrelations of a? 
are, from (11.6), 


t Z0, (11.35) 


Dr och = 


exp(p?f?$*) — 1 
B(p)exp(p?g2) — 1 
This equation is in Ghysels et al. (1996), while the results for p = 1 and p = 2 
are in Taylor (1986). 

When p? f? is small and/or $* is near 1, we can approximate exp( p? 8?9*) — 1 
by $* (exp( p? B?) — 1), to obtain the approximations given in equations (11.30) 
and (11.33). The exact result when p — 2 is 


|» exp4f?9*) - 1 
ECO ER 


t>0. (11.36) 


Pr,aP = 


> 0. (11.37) 


When f = 0.4 and à = 0.98 the first autocorrelation of squared excess returns is 
0.1860 and the approximate value from (11.30) is 0.1873; at lag 20 the approxi- 
mation is much less accurate, the exact and approximate values being 0.114 and 
0.128 respectively. 


11.5.4 Markov Chain Alternatives and Extensions 


The variance of returns conditional upon previous returns, denoted by hz, is a 
complicated function for the standard SV model. As the density of o; is unimodal, 
it is safe to assume that the density of h; has the same property. This contrasts 
with the situation for the two-state volatility model described in Section 11.4. The 
distribution of ^; is bimodal for that model, with most outcomes near to one of 
the two volatility levels whenever the persistence of volatility is near one. The 
marked difference in the distribution of h; can then be used to decide between the 
standard SV model and the two-state volatility model. 
Taylor (1999) essentially proposes estimating the regression model 


EL? = fo + pihy O + pah O, 
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The conditional variance h en for the two-state model is given by (11.22), while 
EE obtained by estimating the GARCH(1, 1) model is used as a good 
approximation to the unknown conditional variance for the standard SV model. 
The two-state model is then preferred if and only if fi > bo. A Monte Carlo 
investigation shows that this procedure has high power to select the correct model. 
When applied to ten years of daily returns from the rate of exchange between 
sterling and the dollar, B 1 is almost zero while bo is near one so that the standard 
SV model is overwhelmingly favored for the data considered. 

So, Lam, and Li (1998) add Markov switching to the standard SV model. They 
replace the mean o of log(o;) by a term o; that is determined by a finite-state 
Markov chain. This model is estimated in their paper with three states for œz, 
using weekly observations of the S&P 500 index. 


11.6 Parameter Estimation for the Standard SV Model 


Many methods have been proposed for estimating the parameters of the standard 
SV model, 0 = (u, a, B, $)', from a set of n observed returns J, = (r1, ..., rn}. 
We suppose u is estimated by the average return r and focus on estimating a, 
P, and ġ. MLEs can only be obtained by complicated methods and hence many 
alternative estimates have been proposed. The likelihood function is the product 
of conditional densities f(r; | L-1), with 1 = (r1,..., rii), but tractable 
expressions for these densities are not known. An obvious modification of equa- 
tions (11.20) and (11.21) provides the densities when the number of volatility 
states is finite. It is probable that these equations can be adapted to approximate 
the standard SV model using a large number of volatility states and hence the 
likelihood can be approximated. 

A formula for the likelihood is given by integrating the product of (a) the 
conditional density of the returns given the volatilities and (b) the density of the 
volatilities. This defines the n-dimensional integral 


L(ri,...,r4,| 0) 


7: Tit, Fn | O1,---, On) f(O1,---, On | 0) doy --- don. 
901402 On 
(11.38) 


Both terms inside the integral can be evaluated with ease, but the exact value of the 
integral can only be evaluated by numerical methods. Fridman and Harris (1998) 
outline an efficient method for calculating an approximation to the integral that 
is based upon a discrete approximation to the distribution of o;. 

There is a trade-off between the accuracy of the parameter estimates and the 
sophistication of the methods. Elementary moment matching and quasi-maximum 
likelihood (or Kalman filter) methods are straightforward. The generalized method 
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of moments (GMM) methodology is a more complicated technique. The MCMC 
(or Bayesian) method advocated by Jacquier, Polson, and Rossi (1994) and the 
simulated likelihood methods of Daníelsson (1994) and Sandmann and Koopman 
(1998) are more accurate, but they are even more complicated. All these methods 
can be adapted to estimate some of the more complicated SV models discussed 
later in this chapter. 

The quasi-ML and MCMC methods also provide information about the distri- 
butions of the unobservable volatilities conditional on the observed returns, which 
is required if volatility is to be forecast into the future. Diagnostic tests for the 
adequacy of the standard SV model are fairly easy to perform for the GMM and 
quasi-ML methods. 


11.6.1 Elementary Moment Estimates 


The mean o and the standard deviation f of log(o;) can be estimated by match- 
ing two moments of the returns distribution. For example, from the sample sec- 
ond and fourth moments we obtain the standard deviation s and kurtosis k and 
hence ĝ? = | log(k/3) and à = log(s) — f? from (11.28), assuming k > 3. The 
sample kurtosis is inaccurate and sensitive to outliers. A more robust method 
equates the sample means a and s of the observed quantities a; = |r; — r| and 
5; = (r; — F)? with the theoretical mean absolute deviation and variance. This pro- 
vides & = log(zra?/(24/s)) and ĝ? = log(25/ (ztà?)), again from (11.28). The 
autoregressive parameter d can be estimated by minimizing a goodness-of-fit 
measure for selected sample autocorrelations. One possibility is to minimize 
f(K,0) A (Pra — Kéi. summing over t = 1,...,50 with Ge a the sam- 
ple autocorrelations of the quantities a;. This possibility is motivated by equation 
(11.30) and we might hope that K=C (1, p). The above estimators are probably 
relatively inaccurate. They are evaluated in Taylor (1982b, 1986) and provide 
estimates that are generally similar to those in more recent studies. 


11.6.2 GMM 


The generalized method of moments seeks parameter values that provide theoret- 
ical moments that are close to the empirical moments, using more moments than 
there are parameters. Melino and Turnbull (1990) is an early example of the 
application of the method to estimating SV parameters. Jacquier et al. (1994) and 
particularly Andersen and Sørensen (1996) provide a detailed investigation of the 
finite sample properties of GMM estimates for the standard SV model. The first 
four moment conditions they consider are 


ens 
DIE (; n = dä — Ele; Eu; I] 
t=1 
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and the remainder are of the form 
1x : n 
g;j(0) = (; dn — FDE ss dé — Elo oA JEIWIf (11.39) 
"ESI 


with i and j counting the moment conditions, k = 1 or 2 and t > O. All the 
terms in (11.39) can be evaluated using (11.7), (11.27), and (11.34). The GMM 
estimate of 0 is given by minimizing the quadratic form ol We for an appropriate 
weighting matrix W (see Hansen (1982) and Hamilton (1994, Chapter 14) for 
general theoretical results for the GMM method). 

Andersen and Sørensen (1996) find a total of fourteen moment conditions is 
appropriate for the parameter values that they consider in their simulation study. 
They also evaluate many methods for choosing W. They encounter some numeri- 
cal problems when $ = 0.98, which casts doubts on the usefulness of the method 
when applied to daily returns. Shephard (1996) lists several criticisms of GMM 
in the SV context. The ad hoc selection of moment conditions can be avoided by 
using the efficient method of moments technique, applied by Gallant, Hsieh, and 
Tauchen (1997), but this is a very technical methodology. 


1163 QML 


The state space representation given by (11.25) can be used to derive conditional 
means and variances for the variables /, = log(|r; — u|) and L; = log(o;), after 
replacing u by the average return r and employing the Kalman filter. The model 
parameters can then be estimated by maximizing a likelihood function. This easy 
method is illustrated in the next section. Harvey (1989) is an authoritative text 
about the filter; its equations are provided in the appendix to this chapter. 

Harvey et al. (1994) apply the Kalman filter to obtain the best predictions of 
l; and L; that are linear combinations of the information J;-; = (l1, ..., L1). 
The best linear predictor of /;, given Ju. is here denoted by Lut and the mean 
square error of the prediction error v; = l; — [;.1,1 is denoted by F;. The updating 
equations for the predictions are particularly simple for the standard SV model 
and take the same form as ARMA(1, 1) predictions when the filter is in its steady 
state. 

The distribution of the observations /; is negatively skewed. The quasi-MLE 
(QMLE) of o, B, and $ is given by pretending the /; have conditional normal 
distributions, 

l| Jia ~ Nau F), (11.40) 


and then maximizing 


n 


2 
1 U 
log L = —; Kä (een + log(F;) + i). (11.41) 


t=1 
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The QMLE is consistent and asymptotically normal. The likelihood theory for 
ARCH models described in Section 10.4 applies to the QMLE for the SV model. 
In particular, the covariance matrix of the QMLE is given by equation (10.32), 
for reasons outlined by Harvey et al. (1994) and Harvey and Shephard (1996). 

The Kalman filter also provides a prediction of L;, namely /; 1,1 — us with 
mean square error F; — (7:?/8). This can be used to obtain an approximately unbi- 
ased prediction of o; from J;—1, given by exp(l;-1,1 — us + "102 — (12/8))). 
Also, the Kalman smoothing algorithm can be used to estimate o; from all the 
observations J,. 

Diagnostic checks for the QML methodology rely on the result that the terms 
zt = v;/./F; are uncorrelated, with zero mean and unit variance, when the model 
is correctly specified. One disadvantage of QML is that the variables /; are sensitive 
to small values of r;, so that the residual v; is a negative outlier when r; = 0. 
This sensitivity can be reduced by using a transformation of Breidt and Carriquiry 
(1996) discussed by Ghysels et al. (1996) and defined in Section 11.7. 


11.6.4 Simulated ML 


The QML methodology can only provide an approximation to the likelihood 
function. This function can be evaluated by a variety of Monte Carlo methods. 
Simulated likelihood values can then be maximized to provide an exact MLE 
for the standard SV model. The first Monte Carlo method was developed by 
Danielsson and Richard (1993) and Danielsson (1994), who use importance sam- 
pling. A conceptually simpler and faster algorithm is presented in Sandmann and 
Koopman (1998). They write the exact log-likelihood of the variables /; as the 
quasi-log-likelihood (given in (11.41)) plus a remainder function. This remainder 
function can be calculated by a small number of simulations that use the Kalman 
smoother. Similar techniques are described by Durbin and Koopman (2000). 


11.6.5 MCMC 


Another way to obtain the information provided by the likelihood function uses 
Bayesian analysis. Bayesian estimation methods make use of datar = (r1, ..., r4) 
anda prior density f (0) for the parameters 0 to find the posterior density f (0 | r). 
Likewise, although we do not observe the latent volatility process, we can seek 
the posterior density f(o | r), with o = (01,...,05). These posterior densities 
can be obtained by the Markov chain Monte Carlo (MCMC) methodology, whose 
principles are explained and illustrated by Chib (2001) and Tsay (2002). 
MCMC methods provide the joint posterior density f(o,0 | r), from which 
we can obtain both f (0 | r) and f (o | r). The Markov chain, here denoted by 
{Xz}, can be generated by Monte Carlo methods that deliver an outcome Xx4.1 = 
(o TI. g**D) from X, and a transition density f(Xz41 | Xx). A technical 
problem is to define a transition density that ensures the chain (X4) is ergodic 
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with the required stationary distribution, namely f (o, 0 |r). All solutions to this 
problem involve complicated algorithms that are beyond the scope of this book. 
The early algorithm of Jacquier et al. (1994) has a high level of dependence in the 
Markov chain so that very long Monte Carlo sequences are required to estimate 
the posterior density. Details are also provided in Jacquier, Polson, and Rossi 
(2004). More efficient algorithms are described by Shephard (1996), Shephard 
and Pitt (1997), and Kim, Shephard, and Chib (1998). These papers also discuss 
the selection of the prior density f (0). An approximation to the likelihood L(r | 0) 
is provided by the methods of Kim, Shephard, and Chib (1998). 

The logical point estimate of 0 is given by the mean of its posterior density, 
which can be estimated from a long realization of N outcomes from the Markov 
chain by 
g(4*D 4... gOD 

N—M 


6= 


, 


with the first M outcomes discarded to diminish the influence of the arbitrary 
starting value X4. Jacquier et al. (1994) provide simulation results that support 
their claim that the MCMC estimate is much more accurate than the GMM and 
QMLE estimates. The covariance matrix of Ê can be estimated after first estimating 
the autocorrelations of the terms 0%, as in Shephard and Pitt (1997). 


11.6.6 Typical Estimates 


We only review estimates for daily returns from foreign exchange here. The stan- 
dard SV model has also been estimated for daily equity returns. However, the 
results are of limited interest for equities because the model does not allow volatil- 
ity to respond asymmetrically to price falls and price rises. 

Estimates of the persistence parameter $ are usually between 0.95 and 0.99, 
just as they are for measures of persistence from ARCH models. Taylor (1986, 
p. 89) uses an elementary moment matching method and reports 0.985, 0.987, and 
0.989 for the DM/$, £/$, and SF/$ rates, from 1974 to 1981. The same method 
gives 0.969 for the DM/$ from 1977 to 1990, compared with 0.938 for the QMLE 
method (Taylor 1994a). Harvey et al. (1994) apply QMLE to DM/$, £/$, SF/$, 
and yen/$ rates for the shorter period from 1981 to 1985 and obtain the estimates 
0.965, 0.991, 0.957, and 0.995, with respective standard errors equal to 0.021, 
0.007, 0.002, and 0.005. 

MCMC estimates are given by Jacquier et al. (1994) for the DM/$ and £/$ 
rates from 1980 to 1990. The point estimates are 0.95 and 0.96 respectively 
and 95% posterior intervals are 0.92—0.97 and 0.94—0.98 respectively. Further 
MCMC estimates are provided by Pitt and Shephard (1999) for the longer and 
more recent period from 1981 to 1998. They study DM/$, £/$, FF/$, SF/$, and 
yen/$ rates and their persistence estimates are 0.965, 0.970, 0.947, 0.953, and 
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0.841 respectively, with standard errors all below 0.004. The low estimate for the 
yen series is surprising. 

Estimates of the standard deviation £ of log(o;) are usually between 0.3 and 
0.7. This is the case for all of the estimates from DM/$ series that are mentioned 
above. The logarithm of the unconditional standard deviation of returns is o + 8? 
from (11.28), with œ the mean of log(o;). As the standard deviation is near 0.7% 
for FX returns, estimates of o are usually between —5.4 and —5.0. 


11.7 An Example of SV Model Estimation for Exchange Rates 


This section can be skipped by readers who are not interested in Excel calculations. 
The QML method is a straightforward method for estimating the parameters of 
the standard SV and related models. It is easy to implement within Excel, unlike 
other methods that instead have the advantage that they produce more accurate 
estimates of the parameters. We illustrate the QML calculations for the standard 
SV model because the calculations are easy to understand; we do not consider 
QML to be superior to alternative methods. 

Exhibit 11.2 shows part of the relevant spreadsheet for our DM/$ data from 
1991 to 2000. The expected return u is estimated by the average observed return 
r. The mean, standard deviation, and autoregressive parameters of the process 
log(o;), respectively denoted by a, f, and d. are estimated by maximizing the 
quasi-likelihood of the observed quantities l; = log(|r; — r|), defined by equa- 
tions (11.40) and (11.41). At time ¢ we already know the conditional mean l,..1 
and the conditional variance F, for /;. We then observe the outcome for l, and 
find the prediction error v;, followed by calculating the next conditional mean 
and variance, /;,; and F;+1, using the filtering equations given in the appendix to 
this chapter. The spreadsheet row for time t is completed by calculating the stan- 
dardized error z; = v;/4/ F; and the contribution to the log-likelihood, namely 
—5(logQ) + log(F;) + v?/ F;). 

The model parameters are in cells H3, H4, and H5 of Exhibit 11.2. The quantities 
T, Me, € + we, H = 12/8, and o? = f? (1 — $?) are located in cells H10-H14. 
The initial values nt = o + ug and Fy = Hj + p are placed in cells E20 and 
G20. It is then only necessary to create formulae for cells D20, F20, H20, DO. 
E21, and G21 before copying and pasting can be used to complete the filtering 
calculations. The most important cell formulae are provided in Table 11.2. 

Exhibit 11.2 shows the results when the log-likelihood (in cell H7) is maximized 
with the constraints 6 > 0.0001 and |ó| < 0.9999. The estimate @ = —5.16 is 
in line with previous DM/$ estimates but the “volatility of volatility" estimate 
p — 0.288 is relatively low. Alternative estimates are given by matching the stan- 
dard deviation of returns and either the mean absolute deviation (& = —5.16, 
B = 0.409) or the kurtosis of returns (ê = —5.15, Ê = 0.376), using formu- 
lae given in Section 11.6. The QML estimate of the persistence parameter is 
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QML parameter estimates for the standard SV model, 
from the DM/$ rate, 1991—2000. 


Exhibit 11.2. 
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Table 11.2. Formulae used in the standard SV model spreadsheet. 


Cell Formula 


D20 =LN(ABS(C20-$H$10) ) 

E20 =H12 

E21 =(1-$H$5)*$H$12+($H$5*D20) - ($H$5*$H$13*F20/G20) 
F20 =D20-E20 

G20 =H4*H4+H13 

G21  -$H$134($H$5*$H$5*$H$13* (G20-$H$13) /G20)+$H$14 
H7 =SUM(I20:12609) 

H12 -H3«H11 

H13  -PIO*PIO/8 

H14  -H4*H4* (1-H5*H5) 

H20  -F20/SQRT(G20) 

I20 2-0.5*C(LNC2*PIODO-«LN(CG20) -H20*H20) 


d = 0.9839 and it is slightly less than the GARCH(1, 1) persistence estimate of 
0.9908 for the same data, calculated in Section 9.4. 

Summary statistics for the data l, are similar to the theoretical values for the 
standard SV model when the QML estimates are inserted into the theoretical 
equations. This is seen in cells C3-D5 for the mean, standard deviation, and 
first autocorrelation of /;. The standardized errors z; have satisfactory summary 
statistics, with the minimum value of —7.69 reflecting the outliers that occur in the 
left tail for the QML method. The outliers can be trimmed by the method of Breidt 
and Carriquiry (1996). If we redefine 2/, to be log(r? -- cs?) —cs?/(r? +cs7), with 
s the standard deviation of the returns, then the minimum value of z; increases to 
—3.69 when c = 0.0002 and the QMLE is almost unchanged with d — 0.9847; 
some writers recommend c — 0.02 but this produces unsatisfactory summary 
statistics for the DM/$ data. 

The filtering equations converge to a steady state. The series F; is independent 
of the data and converges to 1.2754 for the estimated parameters. The quantity 
$9 H, / F, is also independent of the data and converges to the moving-average 
parameter of l, when it is rewritten as an ARMA(1, 1) process with innovations 
vr; the limit is 0.9518 for the estimated parameters. 

Figure 11.3 shows annualized percentage predictions of o; for the SV model that 
are joined by dark lines. These can be compared with GARCH(1, 1) predictions 
that are shown by light dots on the same figure. The SV prediction at time t — 1 is 
a constant multiplied by exp(/;_1,1), with the constant chosen to eliminate bias, 
with the average value of the squared predictions equal to the sample variance of 
the returns. It can be seen that the SV and GARCH predictions are usually similar. 
The standard deviation of the difference between the annualized values from the 
two methodologies is 1.1%. 
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Figure 11.3. DM/$ volatility from the SV model. 


11.8 Independent SV Models with Heavy Tails 


The standard SV model supposes r; — u is the product of independent variables, 
ut ^ N(O, 1) and o;, whose logarithm is given by a Gaussian, AR(1) process. 
Heavier tails for the returns can be modeled by now assuming that the u, have a 
standardized t-distribution with v > 2 degrees of freedom, the density function 
being given by (9.43). This modification of the standard SV model will be called 
the standard SVt model. It was first investigated by Harvey et al. (1994). 

From the definition of the Student f-distribution, u; is the product of indepen- 
dent variables 


ur = U/W; with v; ~ N(0, 1) and (v - 2w; ! ~ x2. (11.42) 
Excess returns are then a mixture of normal distributions, 
Tp — H = ojus = (os / Wt) Ut = Of vi (11.43) 
with log(o;*) following a non-Gaussian, ARMA(1, 1) process. 


11.8.1 State Space Representation 


The state space equations, (11.25), also apply for the SVt model but the distribution 
of 


& = log(\ur|) = log(|vr|) + 5 log(wy) (11.44) 
now has mean and variance respectively given by 
Me = iV G) — v Gv) + log — 2) (11.45) 


and 
of = ga? + Zu, (11.46) 
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where w(x) = dlog r (x)/dx and w’(x) = dy/dx are the digamma and tri- 
gamma functions; similar formulae can be found in Ruiz (1994). 


11.8.2 Moments 


The expectation of r? is finite only when p < v, which contrasts with finite 
moments of all orders for the standard SV model. All the finite moments can 
be obtained from equations (11.26) and (11.27). Recall the notation defined by 
log(o;) ~ Nie, B°). Now E[|r; — ul] = Ellu;llexp(e + 48°), with En 
given by (10.4). Also, the variance of r; is exp(2a + 282), as before, since the 
ur have unit variance, and the kurtosis of r; is 3(v — 2) exp(4B7) /(v — 4) when 
v>4. 


11.8.3 Autocorrelations 


Approximate and exact formulae for the autocorrelations of a? = |r; — j^, 
p > 0, are respectively given in (11.33) and (11.36) for the standard SV model. 
These equations remain valid for the standard SVt model when 2p « v, with the 
function B(p) defined by (11.5) given by Ghysels et al. (1996) as 


rone ET EELER 
l (Get DP apr Set 
The approximate formula becomes 


2 g2 
2 r exp(p"p^) -1 e 
qp = Totus = H 2 H ë 
pear C n BO" = go ae I P> 
(11.48) 


Ghysels et al. (1996) observe that B(p, v) declines as v increases so that the above 
autocorrelations are less than those when u; ~ N (0, 1), which corresponds to the 
limit v — oo. There is still a simple formula for the exact autocorrelations of 
I; = log(|r; — u|), namely 


, v>2p>0. (11.47) 


2 


= sos t0, (11.49) 


Pr,l 


with o2 the function of v defined by (11.46). 


11.8.4 Parameter Estimates 


All the estimation methods for the standard SV model can be adapted to esti- 
mate the parameters of the standard SVt model. The easiest estimation method 
uses the Kalman filter applied to the state space representation to obtain the 
quasi-maximum likelihood function. The additional degrees-of-freedom param- 
eter v changes the constants ug and H; = og that appear in the Kalman filtering 
equations, but the equations themselves are unchanged; they are provided in the 
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appendix to this chapter. The QML method then estimates the additional param- 
eter v, constrained to be at least 2. Harvey et al. (1994) estimate v to be near 6 
for two exchange rate series and to be infinity for two others. 

The QMLE of v for our benchmark DM/S series of daily returns is 15.9, imply- 
ing a distribution for the shocks u; that is close to normal. The quasi-log-likelihood 
is only 0.53 more than forthe standard SV model estimated in the previous section. 
Therefore, the likelihood values do not provide any evidence against the simpler 
model. The other QML estimates for the SVt model are à — —5.12, B — 0.284, 
and $ = 0.9849. 

MCMC methods for SVt models are provided by Chib, Nardari, and Shephard 
(2002) and Jacquier et al. (2004) while simulated MLE is implemented by Sand- 
mann and Koopman (1998). These researchers estimate v to be between 7 and 13 
for daily S&P 500 index returns, although the standard SVt model estimated by 
some of them is mis-specified because it is not an asymmetric volatility model. 


11.9 Asymmetric Stochastic Volatility Models 


The independent SV models considered in previous sections do not allow volatility 
to depend on the direction of price changes. Asymmetric volatility effects can be 
modeled within the SV framework by supposing that there is some dependence 
between the shocks to volatility and the standardized shocks in the logarithms of 
prices. Such dependence is a feature of bivariate diffusion models for price and 
volatility that have been used to price options, to be discussed in Sections 13.4 
and 14.6. The particular diffusion models of Scott (1987, 1991), Wiggins (1987), 
and Chesney and Scott (1989) lead directly to the two equations that we have 
already employed for the standard SV model: 


T; — U+to,u, and log(o;) —« = $(log(o; 1) — o) +. (11.50) 


Asymmetric volatility effects occur in this model if there is appropriate depend- 
ence between n; and one or more of the standardized price shocks (u;, u;—1,...}. 
One possibility is to assume the vector variables (u;, 7j; )' are i.i.d. with a bivariate 
normal distribution. This contemporaneous SV model is rather unsatisfactory as 
the returns process is no longer uncorrelated when u; is correlated with n;, which 
is incompatible with the motivation from diffusion processes (Taylor 1994a); also, 
the expected return ceases to equal u. 


11.9.1 The General SV Model 


Another way to incorporate asymmetry, which has proved far more popular, 
assumes the variables (u;, 7:41)’ have bivariate normal distributions with 


720 E 0 l doy 
p. iid. N (o) : n 2J E (11.51) 
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so that ô is the correlation between u; and gt. Equations (11.50) and (11.51) 
define the general SV model, which is the Euler approximation to the motivat- 
ing diffusion process. The excess returns r; — Uu are a martingale difference and 
hence are uncorrelated, because the variable u; is independent of all subsets of 
(01, 011, O72, Hat, Ut—2, ... }. A negative correlation ô induces a negative 
correlation between r; and oi — o; and then the model generates asymmet- 
ric volatility effects in a manner similar to that of the EGARCH(1) model of 
Section 10.2. 
We next review some theoretical results for the general SV model. 


11.9.2 Moments 


The expectations of univariate functions of returns do not depend on the correlation 
ô, as ue is independent of o;. Consequently, the moments of returns (variance, 
kurtosis, etc.) are as already given in Section 11.5. Likewise, with /; = log(|r; — 
u|) the expectation of Li does not depend on the correlation ô because ne) 
and log(|u;|) are uncorrelated (Harvey et al. 1994). However, the expectations of 
other products do depend on 6. For example, with s, = (r; — n», 


Elsa = exp(4a + 4(1 + $)8)[1-- 432029? 07 P, c0, (11.52) 


with Bp? the variance of log(o;). This result and others that follow can be derived 
by applying the assumption that the vector variables (u;, 441)’ are i.i.d., for 
Yi = exp(n4i), with Ely41] = exp(o2/2) = A, Eluryi+1] = SoA, and 
Elu? y1] = (1 + 8202)4. 

Estimation of the model parameters by GMM requires moments that distinguish 
between positive and negative values of ô. Chesney and Scott (1989) apply the 
result 

cov(ri, Luut — lr) = 9o; expla + 185), (11.53) 


while Melino and Turnbull (1990) make use of the covariances between returns 
and subsequent absolute excess returns, the first being 


cov (ry, Int — HI) = ôo y 2/7 expQa + gu 4 $)). (11.54) 


The covariance between r; and set is also proportional to ô, as noted later in 
(11.58). The return r; is more highly correlated with |r;+1 — ul than with /;.4.1 — 9l. 
For the typical values 6 = 0.4 and @ = 0.98, the covariances in (11.53) and 
(11.54) respectively imply correlations equal to 0.0476 and 0.0946. 


11.9.3 Autocorrelations 
The autocorrelations of /; are 


8 2 
S 5$ 5. Tt>0, (11.55) 


ur 
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for any 6 (Harvey et al. 1994), while those of s; are (Taylor 1994a): 


[1 + 42079? Jexp(48?9*) — 1 
3 exp(4B2) — 1 : 


Dese t 0. (11.56) 


11.9.4 Parameter Estimates 


The most straightforward method for estimating the parameters of the general 
SV model has been developed by Harvey and Shephard (1996). They modify the 
quasi-maximum likelihood estimation methodology, based upon the state space 
representation, to enable QMLE to deliver an estimate of the correlation ô. They 
exploit the fact that the sign of the excess return, A, = sgn(r; — u) = sgn(u;), 
provides some information about the distribution of n;41, since E[y;41 | $;] = 
4/2/180,8; for the general SV model. The Kalman filter can then be applied after 
conditioning the terms /; on the signs S;. Harvey and Shephard (1996, Table 3) 
estimate 6 = —0.66, with a standard error of 0.05, and $ = 0.988 (s.e. 0.003) for 
the daily CRSP returns used by Nelson (1991) to estimate EGARCH models. Their 
methods can also be revised to estimate the general SVt model. Note, however, 
that my notation is similar to, but different from, theirs. 

Two other papers have also used Nelson's data to estimate asymmetric speci- 
fications, but in both a contemporaneous SV model is estimated. Sandmann and 
Koopman (1998) use their simulated MLE method to estimate 6 = —0.375, with 
a reported standard error of only 0.004, and $ = 0.985 (s.e. 0.003). Jacquier et 
al. (2004) find posterior distributions for a contemporaneous SVt model. Their 
point estimate of ó is —0.48, with a 95% posterior interval from —0.54 to —0.42; 
for $ the point estimate is 0.988 with a 9596 interval from 0.984 to 0.992. Their 
estimates of the degrees-of-freedom parameter v, defined in Section 11.7, vary 
from 10 to 32 for the six series that they consider. 

Yu (2004) estimates both the general and the contemporaneous SV models by 
the MCMC method for two US stock index series. He finds very strong empirical 
evidence that the general specification provides the more accurate description 
of his data. The preferred estimates of the correlation ó are —0.32 and —0.39, 
respectively for a few years of S&P 500 and CRSP returns. 


11.9.5 Aggregation 


A variety of theoretical results for sums of returns are included in Ghysels et 
al. (1996). We only discuss distributional results. The distribution of returns is 
symmetric for the general SV model because u; has a symmetric distribution that 
is independent of o, The distribution of aggregated returns is, however, negatively 
skewed when the correlation ô is negative. Then volatility tends to increase after 
a fall in prices and hence the probability of a large fall over more than one period 
exceeds the probability of a corresponding large rise. 
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Let N-period returns and their central moments be defined by 
FN =rt +: ran- and my,p = E[(rnw — NL (11.57) 
Then the N-period skewness is 
skewness(N) — my lm, 


which can be evaluated using the results 


N-1 
mw = Nexp(Qa +26’) and my3=3) (N—j)aj for N > 2, 
j=l 


with 


aj = cov(r;, Fij — ny?) = 200,97 expGa + 15 -- 497) 8?) for j > 1. 
(11.58) 
The skewness is proportional to the correlation 5. When the persistence $ is 0.98 
and the standard deviation of volatility is 8 = 0.4, the skewness is zero when 
N = 1 and0.2138 when N = 2. Assuming ô is negative, the skewness declines as 
N increases until attaining its minimum value of 1.545 when N = 95; thereafter 
the skewness is an increasing function of N, with limit zero as N — oo from a 

central limit theorem. 
The kurtosis of multi-period returns can be found from the further equations: 


: 2 
kurtosis(N) = my,4/m », 


N-1 
my = AN exp(4a + 86°) --6 3 (N bj +12 A cj 
j=l 1<i<j<k<N 


bj = E[srsr. j]. 
Ci jk = Eltz — Hinz — Hitt w] 
SEIN AC (1 +207) exp(4a + 38^ di, j,k), 


and 
STEE wg WEE 7 a7 9107) 40 - 92072), (11.59) 


with b; given by (11.52). Consequently, the kurtosis is a quadratic function of the 
correlation ô. Using the same illustrative parameter values as before, the kurtosis 
equals 5.69 when N = 1, 5.65 + 0.078? when N = 2, 5.25 + 0.868? when 
N = 21, and 4.19 + 2.158? when N = 126; it is a decreasing function of N 
when 6 = 0 or 6 = —0.5, but has a maximum kurtosis of 6.53 at N = 75 when 
ô —-—]1. 
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11.10 Long Memory SV Models 


In Sections 11.5-11.9 we have only considered a first-order autoregressive pro- 
cess for the logarithm of volatility. The autocorrelations of volatility then decay 
geometrically and volatility is said to have a short memory. The same property of 
a short memory carries over to the process of absolute returns and its power trans- 
formations. We noted in Section 10.3 that empirical autocorrelations for absolute 
returns and squared returns provide evidence that short memory volatility pro- 
cesses may be unsatisfactory. Instead, long memory may be a necessary feature of 
asatisfactory volatility model. We discuss the evidence for long memory in volatil- 
ity in much more detail in the next chapter, where the context is high-frequency 
data analysis. Here we consider long memory models of volatility that have been 
motivated by studies of daily returns. These models have similar properties to the 
long memory ARCH models described in Section 10.3. 


11.10.1 The FISV Model 


Breidt, Crato, and de Lima (1998), Harvey (1998), and Arteche (2004) define and 
investigate independent SV models for returns that have the logarithm of volatility 
following the ARFIMA(p, d, q) process that originates in Granger (1980) and 
Hosking (1981). The special case when p = 1 and q = O has received the 
most attention. It is presented here as an extension of the standard SV model of 
Section 11.5 and called the FISV model. The returns and volatility processes are 
defined by 


rt = U + or, 


and 
log(o;) =a - (1 — 9L) ! (1 — L) "m, (11.60) 


with L the usual lag operator. This simplifies to the standard SV model when 
d is zero. Both returns and volatility have a long memory and are covariance 
stationary when 0 « d « 5. 

The autocorrelations of log(o;) can be evaluated from formulae in Baillie 
(1996), Breidt et al. (1998), and Harvey (1998). They are asymptotically pro- 
portional to CT. assuming 0 < d < 1, and the same asymptotic result is 
applicable to o;, a; = |r; — u|, and s; = (r; — mu (Andersen and Bollerslev 


19972). 


11.10.2 Estimation 


When the variables log(o;) follow an ARFIMA process, the logarithms of absolute 
excess returns, l; = log(|r; — u|), also follow an ARFIMA process. Furthermore, 
the differencing parameter d is the same for log(o;) and /;. The spectral densities 
of these processes are approximately proportional to o" 29 for small positive c. 
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Consequently, d can be estimated from observations of l, (after replacing u by 
r) using some version of the spectral estimator proposed by Geweke and Porter- 
Hudak (1973), without first specifying values for p and q. Bollerslev and Wright 
(2000) investigate a suitable estimator of d using Monte Carlo methods. They 
document its downward bias when low-frequency data are used. 

Breidt et al. (1998) describe spectral quasi-likelihood estimation for the FISV 
model defined by (11.60). Estimation from the process for /; is not practical in 
the time domain but the quasi-likelihood function for this process has a simple 
form when it is stated in the frequency domain. A Monte Carlo study of the 
QML estimates shows they are accurate and almost unbiased for a series of 4096 
observations, when d is the typical value 0.4 and ¢ is 0, 0.4, or 0.8. The estimates 
are less satisfactory for a series of 1024 observations. Asymmetric extensions of 
the FISV model can also be partially estimated by maximizing the spectral quasi- 
likelihood. The method then estimates all of the parameters except the correlation 
6 defined in (11.51). Breidt et al. (1998) estimate d = 0.444 and $ = 0.932 
for the daily returns from a value-weighted CRSP index between 1962 and 1989. 
They permit the variance of log(u2) to be an additional parameter, whose estimate 
suggests a distribution with a heavy tail. 

GallantTauchen et al. (1997) use their efficient method of moments technique 
to estimate several SV models including a long-memory model. Their estimates 
of d range from 0.48 to 0.55 for a very long series of adjusted daily returns from 
the S&P composite price index. 


11.11 Multivariate Stochastic Volatility Models 


Two ways to model the volatility of several assets by using component SV models 
have been investigated. The first methodology specifies each of N return processes 
as a standard SV model and incorporates parameters that permit general covari- 
ances between asset returns and asset volatilities. A general multivariate model, 
based upon remarks in Ghysels et al. (1996), is given by 


ri; = Hi + Oi, tUi,t, 
Lit = log(oi,). 
Ley Lye) mu + (Lii =a) +n. (11.61) 


The vector AR(1) process L; has i.i.d. residuals n; that are stochastically indepen- 
dent of the i.i.d. vector of shocks u;; the distributions of 7; and u; are multivariate 
normal, with covariance matrices 2, and 2,, and the diagonal entries in 2, are 
all unity. The matrix 27; can be singular, so that each asset's volatility is a linear 
combination of less than N common terms. 

Harvey et al. (1994) estimate a special case of this model for four dollar 
exchange rates by using the multivariate extension of the QML method, which 
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only requires applying the Kalman filter to a vector process. They assume ® is 
the identity matrix and then a principal components analysis of the estimate of X} 
shows that two factors explain the volatilities of the four rates. Mahieu and Schot- 
man (1994) use QML estimation for all six bilateral rates obtained from the dollar 
and three other currencies, thus avoiding the selection of a numeraire currency. 
Both of these studies show how to find the covariance matrix of & = log(|u;|) 
from X,. Danielsson (1997) describes a multivariate simulated MLE algorithm 
and obtains higher likelihoods than for multivariate ARCH models. 

A second methodology explicitly assumes a factor structure for volatility. Kim, 
Shephard, and Chib (1998) propose that the N x 1 vector of returns r; depends 
on an unobservable k x 1 factor f; and an idiosyncratic N x 1 residual e: thus, 


re — ut Bf t+ & (11.62) 


for some N x k matrix of factor loadings B. Each component of f; and e; is 
assumed to follow a standard SV model and these N + k models are independent 
of one another. The model is identifiable when b;; = 1 and b;,; = O for i < k 
and j > i, giving a total number of volatility and factor parameters equal to 
3(N +k) + Nk — k(k + 1)/2. 

Pitt and Shephard (1999) use MCMC methods to estimate this model for five 
dollar exchange rates and two factors, finding that one- and two-factor models 
give similar results for daily returns from 1981 to 1998. The persistence parameter 
of the volatility in the one-factor model is 0.970. Chib, Nardari, and Shephard 
(2005) enhance the MCMC method and use their algorithm to estimate a more 
complicated factor model. 


11.12 ARCH versus SV 
11.12.1 Comparisons 


Several volatility models are similar in the sense that they can explain the major 
stylized facts for asset returns. It is natural to seek tests that can decide which of 
these models provides the best description of asset returns. There is a powerful test 
for comparing the two-state Markov chain model for volatility with the standard 
SV model, as noted in Section 11.5. This test provides strong empirical evidence 
against the two-state model for the £/$ exchange rate (Taylor 1999). 

The more interesting comparison for symmetric volatility models is between 
the standard SV model and either the GARCH(1, 1) model or the symmetric 
version of the EGARCH(1) model. We have to be aware, however, that there may 
be no useful way to discriminate between these models. The mathematical reason 
is that the limit of EGARCH(1) for higher-frequency data is the diffusion process 
that motivates the general SV model (see Section 13.4 and Nelson (1990a)). 
Consequently, debating the advantages of these models as descriptions of data 
may be pointless. 
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Kim, Shephard, and Chib (1998) compare the likelihoods of standard SV mod- 
els with those of GARCH(1, 1) models that have either (a) conditional normal or 
(b) conditional t-distributions. For comparison (a), the three-parameter SV model 
has higher likelihoods than the three-parameter ARCH model for daily observa- 
tions of four exchange rates from 1981 to 1985. Nonnested likelihood-ratio tests 
strongly favor the SV model. For comparison (b), the ARCH model has an extra 
parameter and is within the SV family as defined in Section 11.2. It also has 
higher likelihoods than the standard SV model. Empirical likelihood comparisons 
between GARCH(I, 1)-t and SVt would be helpful but none are known. Given 
that degrees-of-freedom estimates are higher for SV than for ARCH, we may 
anticipate that the likelihood advantage of SV when comparing three-parameter 
models may disappear when comparing models that have four volatility parame- 
ters. 

An esoteric difference between standard SV models and ARCH models is based 
upon the concept of reversing time. A stochastic process is reversible if and only if 
the likelihood function L(r;44, 7:42, ---.r4n) equals L(rptns ripa... Ft41) 
for all n and t. Gaussian AR(1) processes are reversible and hence the standard 
SV model also has this property. The EGARCH(1) model is not reversible. Taylor 
(19942) states that it is difficult to make constructive use of this insight when the 
ARCH model is not conditionally normal. 


11.12. SARV 


It is possible to combine parametric ARCH and SV models into a general struc- 
ture. Andersen (1994) defines a flexible general stochastic process that includes 
popular ARCH and SV specifications as special cases. His polynomial stochas- 
tic autoregressive volatility (SARV) model is defined by the following equations 
when conditional expected returns are constant: 


r, = u + g(Ko)z 


and 
K; = w + K-11 + ( c a K; 1), (11.63) 


with z; ~ iid. Dz(0, 1), & ~ iid. Dei, og) and with z; independent of £;...j, 
j 2 0. These equations simplify to the GARCH(1, 1) model when y = 0, 
£y — SE and g(K;) = \/K;, with K; then being the conditional variance. There 
is also a simplification to the standard SV model when a = 0, & = 7; + 1, and 
g(K;) = exp(K;), with K, then representing the SV variable log(o;). Andersen 
(1994) also shows that the EGARCH(1) and the general SV models are members 
of the SARV family. He recommends estimating the SARV parameters by the 
generalized method of moments and he provides formulae for evaluating appro- 
priate moments. An illustration of the methodology is provided for a bivariate 
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dataset of returns and trading volumes. Further discussion of the SARV family 
can be found in Ghysels et al. (1996) and Meddahi and Renault (2004). 


11.13 Concluding Remarks 


Stochastic volatility models provide alternative models and methodologies to 
ARCH models. SV models give more prominence to volatility because these 
models specify a process for volatility, rather than for conditional variances. The 
downside of the central role of volatility in SV models is that the conditional 
variances are very complicated functions, so that maximum likelihood estimation 
is far from straightforward. 

High-frequency data provides useful volatility information that cannot be ex- 
tracted from daily returns. We consider this information in the next chapter and 
see that it offers further methods for estimating the parameters of SV models and 
the unobservable volatility process. Later, the continuous-time versions of SV 
models are defined in Chapter 13 and their option pricing formulae are presented 
and discussed in Chapter 14. 
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11.14 Appendix: Filtering Equations 
The standard SV model has the linear state space representation given by (11.25): 
l;—L;-& and L;-—(1—6)o-FóL;-1- nr, 


with E[E,;n;] = 0. This can be rewritten in a conventional form after defining 
zero-mean variables by £; = & — we and L7 = L, — a, so that 


Las ietus +e, and L¥=oL* +m. 
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Following Sandmann and Koopman (1998), let 


L* 
Z= (1 1); H, = in”, at = g H 
a+ ug 


Then 
lı = Zr +e, and a; = Tidi +m 


with var(e,) = H;, var (ñ) = Qr, E[e;fj;] = 0, and (e;, 7/)! iid. 

From a; and P; known at time t — 1, which are the conditional mean and 
variance of o; for Gaussian state space models, the Kalman filter provides the 
best linear predictor /;-1,1 = Z;a; and 


w-l-Za,  FE-ZPBZ-TH, Ki-TüBEZF |, 
doit = Tra; + Kruz, Prot TuaPBGu — Kr Zr)’ + Qui. 


as in Sandmann and Koopman (1998), whose notation differs slightly from that in 
the reference text by Harvey (1989). These recursive equations can be initialized 


with 
20 ae 
a= S and Pj— E E A 
a+ us 0 0 


The preceding equations simplify because of the zeros in the matrices Q; and 
T,. In particular, the filter calculations can be reduced to 


wu = l; — lia. 
la = (1 — pæ + ug) tél — ént, 
Fii = H, +’ (F, — H) H, F | tes (11.64) 


from which the quasi-likelihood can be calculated, as stated in (11.41), commenc- 
ing with lo,ı = o + pg and Fy = Hi +0} /(1 — ¢°). 

The filtering equations are the same for the standard S Vt model, although ug and 
H = og must be replaced respectively by the mean and variance given in (11.45) 
and (11.46). For the more general set-up when there is a nonzero correlation 
between uç and either n; or ge), the Kalman filter can be applied either as above 
or to the same process conditional on the signs of excess returns. See Sandmann 
and Koopman (1998) and Harvey and Shephard (1996) for the relevant formulae 
when the information in the signs is used, respectively for dependence of u; on 
1; and of u; on 741. 


Part IV 


High-Frequency Methods 
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High-Frequency Data and Models 


Prices recorded several times each hour generate large datasets. Several properties 
and applications of these high-frequency datasets are described in this chapter, 
for both equity and foreign exchange markets. Special attention is given to the 
more precise volatility estimates obtained from high-frequency return data. 


12.4 Introduction 


High-frequency is the adjective used to indicate that prices are recorded more often 
than daily. The more prices a day, the higher is the frequency of the observations. 
Complete datasets contain all prices and/or quotes, for which Engle (2000) uses the 
phrase ultra-high frequency. Most research, however, employs regularly sampled 
data and the most common frequency is probably one price every five minutes. 

High-frequency data have their advantages but they present new challenges. 
On the plus side, the additional price observations allow us to learn more about 
how prices react to information. More observations also enable us to estimate and 
forecast volatility more accurately, which benefits derivatives traders and risk and 
portfolio managers. On the other hand, microstructure effects (such as the spread 
between buying and selling prices) become more important, intraday patterns in 
trading behavior have to be modeled and the size of datasets can become daunting. 
Nevertheless, analysis of high-frequency data is rewarding and well worth the 
additional effort. 

Itis more difficultto obtain cheap high-frequency data than itis to obtain compa- 
rable daily data. There were few high-frequency studies before the 1990s, notable 
examples being Wood, McInish, and Ord (1985), Harris (1986), and Kawaller, 
Koch, and Koch (1987) for the US equity market. The problem of data avail- 
ability disappeared when Olsen & Associates (O&A) gave away a year of their 
ultra-high-frequency exchange rate data, leading to several studies that were pre- 
sented at the O&A conference in 1995. Section 12.2 describes these data and their 
properties are a recurring subject in this chapter. The results of fifteen years of 
FX research by the O&A organization are presented in the impressive book by 
Dacorogna, Gengay, Müller, Olsen, and Pictet (2001). The most studied high- 
frequency equity data are probably those in the Trade and Quotation database of 
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NYSE, AMEX, and NASDAQ prices, which can be bought from the New York 
Stock Exchange. 

Some general features of high-frequency data are covered in Section 12.2, 
including the market microstructures that define price data, the selection of a 
frequency, and the aforementioned O&A dataset. A single day of stock index 
prices is then discussed in Section 12.3, to illustrate some characteristics of ultra- 
high-frequency price records. 

The stylized facts for high-frequency returns are similar to but distinct from 
those for daily returns. Five general facts are described in Section 12.4. Of partic- 
ular importance are variations in the average level of volatility throughout the day, 
some of which can be explained by major macroeconomic news announcements. 
Estimates of the intraday volatility pattern are discussed in Section 12.5. These 
estimates appear in some of the intraday methodologies for modeling volatility 
that are covered in Section 12.6. Intraday trading rules are reviewed quickly in 
Section 12.7. 

Volatility can be estimated and modeled more precisely by using high-frequency 
returns data. Recent research has focused on the properties of realized volatility, 
which is defined as the square root of the sum of the squares of intraday returns. 
Section 12.8 outlines the theory of this volatility estimate and then Section 12.9 
covers its empirical properties in some detail. The evidence for long memory 
effects in volatility is found to be particularly striking when high-frequency data 
are used. 

Some studies that assess the impact of information on prices are reviewed in 
Section 12.10. Comparisons of the rates at which different markets reflect the 
same information in their prices are of particular interest. Models for the times 
that elapse between trades are described in Section 12.11. The distribution of 
extreme returns, at all frequencies, is the subject of Section 12.12. 

Sometimes only summary values for the intraday price record are known. Sec- 
tion 12.13 covers volatility estimation using intraday high and low prices. Finally, 
Section 12.14 concludes this high-frequency chapter. 


12.2 High-Frequency Prices 
12.2.1 Microstructure 


Trading at financial markets around the world is organized in several ways and the 
precise microstructure of a market often determines the form of available data. 
Equities have been primarily traded at exchanges that have a physical location, 
while most currency trading has been over-the-counter (OTC) consisting of inter- 
bank deals. Whether exchange traded or OTC, asset prices may be determined by 
either a quote-driven or an order-driven structure. These structures differ in the 
method that establishes a price between buyers and sellers. Quotes are issued by 
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market makers who are prepared to be both buyers and sellers of the asset. Either 
there is a monopolist specialist who makes the market or the task is competi- 
tive. Alternatively, orders can be matched automatically by electronic systems, 
thereby removing intermediaries from the dealing process. Trading mechanisms 
do change and recent years have seen much more electronic trading and more use 
of order-driven structures. 

Goodhart and O'Hara (1997) includes a detailed discussion of microstruc- 
ture and consequences for high-frequency research, which supplements the text- 
book on microstructure by O'Hara (1995). Gourieroux and Jasiak (2001) provide 
numerical examples of the operation of markets at the micro level. 


12.2.2 Price Types 


Price data are often only available as either quotes or transaction prices. Quotes 
may be firm up to a specified amount or they may only be indicative so that further 
negotiation is required if a quotation is acceptable to some counterparty. A market 
maker usually quotes both a bid and an ask price, and is willing to sell at the ask 
a and to buy at the bid b. The spread, a — b, provides an income to the market 
maker in return for supplying liquidity, which is a risky activity. At order-driven 
markets a typical order book contains a set of limit orders from which the most 
competitive bid and ask prices are determined. 

The best databases include both a and b, although some only provide the mid- 
point, (a + b)/2. The midpoint need not reflect beliefs about the current value of 
the asset when quoters prefer to trade one side of the market. Spreads emphasize 
that assets have more than one price for those people who wish to trade immedi- 
ately, with immediate buyers paying more than immediate sellers receive for the 
same goods. Transactions need not occur at an endpoint of the spread if a deal 
can be agreed at some interior point. 

Transaction price datasets can reflect a spread even when the market is order- 
driven, because orders can be of varying types with some requiring an immediate 
transaction that will generally occur at a worse price. Transaction prices are often 
recorded without supplementary information about the best bid and ask contem- 
poraneously on offer. Prices may exhibit a bid—ask bounce effect for a period 
of time, during which the best bid and ask are constant with transactions prices 
bouncing between the two levels. 

High-frequency datasets include prices and the times at which they are recorded, 
often accurate to the nearest second. They may also include quantities traded. 

Errors are more likely to occur in the very large datasets that arise when infor- 
mation is collected about all trades and/or quotes. The impact of errors is more 
severe than in daily price records, since intraday returns have smaller standard 
deviations and hence return outliers created by price errors are more extreme. 
Dacorogna et al. (2001) describe algorithms for detecting suspect prices. Their 
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Table 4.6 shows that fewer than 0.146 of recent Reuters real-time data for major 
FX rates were classified as outliers. In a study of FTSE futures transactions, Areal 
and Taylor (2002) found 56 suspicious prices in a set of 2.85 million, by inspecting 
the major return outliers, but could not, of course, expect to find all incorrectly 
recorded prices. 


12.2.3 Selection of a Frequency 


Complete records of transactions and/or quotes are usually so numerous that 
they are sampled at some intraday frequency. Higher frequencies provide more 
information but contain relatively more noise from the bid—ask spread. Sampling 
prices every five minutes is probably the most popular choice, following Andersen 
and Bollerslev (1997b). 

In volatility studies it is often appropriate to choose the frequency to avoid 
bias caused by microstructure effects. This motivates the selection of the thirty- 
minute frequency in some of the research by Andersen, Bollerslev, Diebold, and 
Labys (2000, 2003). In contrast, Bandi and Russell (2004a,b) find that the optimal 
frequency is near to five minutes. 


12.2.4 Clocks and Trading Hours 


Precise times are important for intraday studies. Clocks are reset twice a year 
in the US and Europe, but not in Japan. When the most important US macro- 
economic news is released in the North American winter, it is 07:30 Central 
Standard Time (CST) in Chicago, 08:30 Eastern Standard Time (EST) in New 
York, 13:30 Greenwich Mean Time (GMT) in London, 14:30 in Paris, and 22:30 
in Tokyo. In the summer these times are the same in the US, but are usually 
13:30 British Summer Time in London and always 21:30 in Tokyo. A further 
complication is that the US and the UK change their clocks at different times in 
spring and autumn so there are a few days every year when London is not five 
hours ahead of New York. 

The OTC foreign exchange market never shuts—it is a 24-hour market. Stock 
markets open during local business hours. These times may well change in the 
future, but recent local times have been from 09:30 to 16:00 in New York, from 
08:00 to 16:30 in London and from 09:00 to 15:00 in Tokyo, with slightly different 
times for index futures trading. 


12.2.5 The FX Database of Olsen & Associates 


Much high-frequency research has been stimulated by the one-year database of 
spot DM/$ and yen/$ quotations collected and distributed by Olsen & Asso- 
ciates, which will be called the O&A database throughout this chapter. Published 
research includes Acar and Lequeux (1999), Andersen and Bollerslev (1997a,b, 
19982), Chang and Taylor (1998), Danielsson and de Vries (1997), DeGennaro 
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Figure 12.1. FTSE 100, March futures trades on 22 December 1999. 


and Shrieves (1997), Engle and Russell (1997), Ghysels et al. (1998), Kanzler 
(1998), Peiers (1997), Ramsey and Zhang (1997), Taylor and Xu (1997), and 
Zhou (1996). Most attention has been given to the DM/$ dataset, which contains 
more than 1 400 000 quotations on the interbank network for the year between 
October 1992 and September 1993 inclusive, time stamped using GMT. 

This dataset is a good example of the massive amount of available data and 
of their limitations. The quotes are merely indicative. Their precise source is 
unclear. It is widely believed that almost all Reuters quotations are included, 
but it is less well known firstly that some of the data came from Knight Ridder 
and Telerate and secondly that there are nine gaps in the data due to technical 
failures, each lasting several hours (Kanzler 1998). Less than 0.196 of the quotes 
are made between 21:00 GMT on Friday and 21:00 GMT on Sunday. Following 
Andersen and Bollerslev (1997b), most researchers ignore this weekend period. 
The quotation rate is also low on Christmas Day and New Year's Day. 


12.3 OneDay of High-Frequency Price Data 


We consider one day of London futures prices for the FTSE 100 index to illus- 
trate some of the general characteristics of price records that are intended to be 
complete. The day selected 1s Friday, 22 December 1999, when futures trading 
was essentially electronic and order-driven; any very large deals at the “upstairs” 
market are not recorded in the illustrative dataset. 

A total of 1741 electronic transaction prices, times, and volumes were recorded 
for the March 2000 contract by the LIFFE exchange during the nine-and-a-half 
hours that the futures market was open, from 08:00 to 17:30 GMT. There were 3.0 
transactions per minute on average, with a median interval between transactions 
of five seconds. Twenty of the inter-transaction times exceed three minutes and the 
longestis almost seven minutes. The transactions rate was particularly high around 
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Figure 12.2. Bids, asks, and trades for fifteen minutes. 


the market open, with 197 between 08:00 and 08:30. It was also relatively high 
in the hour after US equity markets opened at 14:30 GMT, with 323 transactions 
in that hour. More than two-thirds of the transactions were for either one or two 
contracts, while the largest decile has trades of ten or more contracts. 

Figure 12.1 shows each transaction price during the day. The market opened 
at 6785 and traded between 6772 and 6827, with a final price of 6786.5 eleven 
seconds before the market closed. These prices have a tick size of 0.5, although 
only 19% of them are not integers. Some 45% of the prices were identical to the 
preceding price and 17% of the price changes were one tick. 

The records sold by LIFFE also include 2206 bid and 2162 ask prices on the 
same day, which are the most competitive prices at the time they were submitted 
to the order book. A closer look at some of the prices can be obtained from 
Figure 12.2, which shows fifteen minutes of bid, ask, and transaction prices, 
commencing at 15:45. The order prices are shown by small dots and joined by 
lines, while the transaction prices are shown by large dots and are not joined 
together. One-third of the prices are for trades, with 52 trades at the bid price and 
22 at the ask. The first eight trades are at the bid, although less than eight dots 
can be seen for these trades as some are recorded at identical times. After them, 
the next five trades are at the ask. 

Only a fifth of the bid and ask order prices can be matched using the criterion that 
they occur at identical times. Figure 12.3 shows the 437 values of the spread that 
can be calculated from contemporaneous orders. The mode, median, and average 
of these spreads are respectively 1, 2, and 2.2, with a range from 0.5 to 12. 


12.4 Stylized Facts for Intraday Returns 


Intraday returns have stylized facts that are similar to but distinct from those 
presented for daily returns in Chapter 4. General statements that apply to almost 
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Figure 12.3. Bid—ask spreads. 


all high-frequency datasets must take account of the variation in frequencies and 
microstructure effects found in such datasets. Five general statements are made 
in this section. 


12.4.1 Returns 


An intraday return is defined in exactly the same way as a daily return. It is simply 
the change in the logarithm of the price during an interval of time, assuming 
there are no dividend payments during the interval. Suppose we consider five- 
minute returns. Then we require one price for each five-minute interval and it is 
conventional to use the last price from the interval. When the price data are bid 
and ask prices, it is normal to calculate returns from midpoint prices. 

Reliance on latest prices will lead to returns that are measured over periods 
that are not exactly five minutes as many of the prices will not be recorded at 
exactly the end of an interval. This will be unimportant when the average time 
between available prices is short, say a minute or less. There may be no prices in 
some intervals, in which case it is necessary to use the most recent price. As an 
alternative to using latest prices, a return over exactly five minutes can be obtained 
by linear interpolation between the last price in an interval and the first price in 
the next interval, as in Andersen and Bollerslev (1997b), although this can create 
spurious predictability. 

Equity and many other markets are not open continuously. Overnight and week- 
end returns are often ignored in high-frequency studies as they occur over much 
longer intervals of time and must have different statistical properties to intraday 
returns. Closed-market returns are nevertheless often relevant. For example, Areal 
and Taylor (2002) note that the aggregate intraday return from a long position in 
FTSE 100 futures is —3% per annum from 1990 to 1998 but the aggregate return 
is 12% per annum. Thus all the equity risk premium was earned when the market 
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was closed during these years. They also find that more than 30% of the variance 
of daily returns is attributable to the hours when the market is closed. 


12.4.2 The Distribution of Intraday Returns 


The means and variances of intraday returns are necessarily small numbers. Mean 
returns are similar across the trading day. There is some evidence for higher 
average returns around the open and the close of US equity markets, given by 
Harris (1986) and Andersen and Bollerslev (1997b) respectively for cash and 
futures markets. Variances vary considerably within days and we discuss the 
intraday volatility pattern in detail in the next section. 

The shape of the distribution is leptokurtic and more so than for daily returns. 
We adapt the first major stylized fact for daily returns to: 


1. Intraday returns have a fat-tailed distribution, whose kurtosis increases as 
the frequency of price observations increases. 


This result is to be expected when returns have a finite fourth moment. It is then 
almost inevitable when intraday returns are uncorrelated and daily returns have 
excess kurtosis, while the central limit theorem implies the distribution converges 
to the normal as the inter-price interval increases. 

Dacorogna et al. (2001, Table 5.1) document a kurtosis of 38 for ten-minute 
DM/$ returns from 1987 to 1993 and declining values for decreasing frequencies: 
27 for hourly returns, 12.4 for six-hourly returns, 6.3 for daily returns, and 3.7 
for weekly returns. They also find the skewness of the distributions is near zero. 
Andersen and Bollerslev (1997b) report summary statistics for S&P 500 futures 
from 1986 to 1989, but excluding a month around the 1987 crash. Their kurtosis 
estimates are 29, 33, and 16 for 5-, 25-, and 200-minute returns, with skewness 
estimates —0.6, —1.8, and —1.5 for these frequencies. Areal and Taylor (2002) 
estimate the kurtosis as 25 for five-minute returns from FTSE 100 futures between 
1990 and 1998. 

The distributions of high-frequency returns often show a sharp spike at zero, 
which can simply reflect the feasible set of discrete prices rather than few trades 
per interval. For example, some 22% of the five-minute returns of Areal and 
Taylor (2002) are zero but less than 3% of their five-minute intervals contain no 
transactions. 


12.4.3 Autocorrelations of Intraday Returns 


More dependence might be anticipated in intraday returns than in daily returns 
for two reasons. First, bid-ask bounce in transaction prices or a midpoint bounce 
induced by dealers with order imbalances will show most clearly at higher frequen- 
cies. The negative autocorrelation created by bouncing prices is proportional to 
the variance of the spread divided by the variance of returns (see equation (6.18)); 
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the numerator is constant while the denominator decreases as the frequency of 
returns increases. Second, the exploitation of any dependence is more difficult 
when expected profits per trade decline as data frequency increases, but costs do 
not. The magnitude of observed dependence is, however, often remarkably small. 
Our second stylized fact is: 


2. Intraday returns from traded assets are almost uncorrelated, with any 
important dependence usually restricted to a negative correlation between 
consecutive returns. 


Some estimates of negative first-lag autocorrelation for foreign exchange re- 
turns from quotes are (1) around —0.18 for a few days of one-minute returns 
(Goodhart and Figliuoli 1991), (ii) —0.040 for one year of five-minute DM/$ 
returns (Andersen and Bollerslev 1997b), with —0.070, —0.082, and —0.043 for 
10-, 20-, and 30-minute returns, and (iii) —0.108 for one year of five-minute yen/$ 
returns (Chang and Taylor 2003), with —0.093, —0.066, and —0.018 for 10-, 30-, 
and 60-minute returns. The autocorrelations of the longer time series of six years 
of one-minute DM/$ graphed by Dacorogna et al. (2001, Figure 5.1) are —0.16 
atlag 1, —0.02 at lag 2, —0.01 at lag 3, and are thereafter negligible. Much more 
dependence is found in tick-by-tick returns by Zhou (1996). 

Estimates of the first-lag autocorrelation for returns from equity index futures 
are almost zero. A value of 0.009 is given for four years of five-minute S&P 500 
returns by Andersen and Bollerslev (1997b), increasing to 0.039 for hourly returns, 
while 0.001 is the value for eight years of five-minute FTSE 100 returns in Areal 
and Taylor (2002). Substantial positive autocorrelation can occur, however, for 
nontraded assets such as spot indices whose component prices are not continually 
updated. Stoll and Whaley (1990) report first-lag autocorrelations for five-minute 
returns equal to 0.24 for the Major Market Index (MMI) of 20 highly active stocks 
and 0.45 for the broad-based S&P 500 index, from 1984 to 1986. 

Stale prices provide a satisfactory explanation for spot index dependence as pos- 
itive dependence is not found in the returns from the component stocks. Instead, 
several researchers have reported significant negative dependence for individual 
stock returns. One example is the median first-lag autocorrelation of —0.21 for 
five-minute returns from the thirty stocks in the DJIA index, from 1993 to 1998 
(Andersen, Bollerslev, Diebold, and Ebens 2001). More extreme examples are 
median estimates of —0.27 and —0.48 for daily sets of one-minute transaction 
returns, respectively for IBM traded at the NYSE and Intel traded at NASDAQ in 
1994 (Lin, Knight, and Satchell 1999). 


12.4.4 Autocorrelations of Intraday Absolute Returns 


The significant positive autocorrelation among daily absolute returns across many 
lags can be attributed to volatility clusters. There will be similar clusters in intraday 
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Figure 12.4. Autocorrelations for DM/$ thirty-minute absolute returns. 


returns but their durations may appear to be shorter if there is an interaction with 
a strong intraday volatility pattern. The third stylized fact is revised for intraday 
returns to: 


3. There is substantial positive dependence among intraday absolute returns, 
which occurs at many low lags and also among returns separated by an 
integer number of days. 


A clear example is given by Andersen and Bollerslev (1997a, Figure 2, 1997b, 
Figure 4a) for the O&A year of five-minute absolute DM/$ returns. There it is seen 
that the autocorrelations commence with 0.31 at lag 1, decline to —0.02 at lag 144 
(twelve hours) and then rise to 0.15 at lag 288 (one day). The U-shaped pattern is 
then repeated for lags 289 to 576 with a peak autocorrelation of 0.14 at lag 576 
(two days). Peaks then recur at multiples of 288 lags with the autocorrelation at 
these peaks declining slowly. A 48-hour weekend period is removed prior to these 
calculations. 

Figure 12.4, from Chang and Taylor (2003), shows these peaks for the same 
dataset when the frequency is thirty minutes; the autocorrelations are joined by 
solid lines. 

Dacorogna et al. (1993, Figure 4, 2001, Figure 7.4) include the weekend in 
their calculations for four years of twenty-minute absolute DM/$ returns. The 
seasonal pattern is then across weeks instead of days and there are autocorrela- 
tions of 0.40 at lag 1, 0.15 at lag 72 (one day), and 0.24 at lag 504 (one week). 
Negative autocorrelations occur in these studies, at lags around one-half the sea- 
sonal period, because intraday volatility variation then dominates the persistence 
in daily volatility. 

Similar features are found in intraday equity absolute returns. Andersen and 
Bollerslev (1997b, Figure 4b) find all of the first 400 autocorrelations are positive 
for their five-minute data for S&P 500 futures. They commence with 0.29 at lag 1, 


12.4. Stylized Facts for Intraday Returns 315 
0.25 4 
0.20 4 


0.15 AEN. 


Correlation 
c 
— 
= 


160 


-0.05 : Lag 


Figure 12.5. Autocorrelations for intraday absolute S&P returns. 


fall to 0.07 at lag 40 (half a trading day), and rise to 0.14 at lag 80 (one day), 
again with a clear U-shaped pattern that repeats once a day. The general pattern 
can be seen in Figure 12.5, which shows the autocorrelations (as a solid line) for 
five-minute returns on the spot S&P 100 index (from July to October 1999) up to 
lag 154 (two days). 

The periodic (or diurnal) behavior of intraday volatility, visible in the auto- 
correlations of absolute returns, was first identified by calculating the standard 
deviations of returns at the NYSE for intraday periods. For example, the pioneer- 
ing study of Wood et al. (1985) found volatility was highest around the open and 
the close. The periodic effect is so marked that it is our fourth stylized fact: 


4. The average level of volatility depends on the time of day, with a significant 
intraday variation. 


Methods for estimating this important effect are described in the next section. 

Incidentally, Müller, Dacorogna, and Pictet (1998) calculate the first ten auto- 
correlations of |r;|? for various powers p, from thirty-minute DM/$ returns. The 
maximum dependence occurs when p is near one-half, with similar autocorrela- 
tions for absolute returns that are more than twice the values for squared returns. 
The same results can be seen in Dacorogna et al. (2001, Figure 5.11). 


12.4.5 The Impact of Macroeconomic News on Volatility 


Markets that are open respond rapidly to official US macroeconomic announce- 
ments. The most important monthly US announcements are made at 08:30 EST 
and less important news releases are made at later times. Ederington and Lee 
(1993, 1995) consider the price impact of reports about nineteen macroeconomic 
variables for Treasury bond, Eurodollar and Deutsche mark futures, whose mar- 
kets opened at 08:20 EST during their studies. 
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Their first paper compares five-minute return standard deviations for announce- 
ment and nonannouncement days, with an announcement day defined as a day for 
which at least one of the nineteen reports was released. They find that the return 
from 08:30 to 08:35 on announcement days is much more volatile than, firstly, 
all other five-minute returns on announcement days and, secondly, all returns on 
nonannouncement days. The standard deviation of 08:30/08:35 announcement 
returns is four to five times the corresponding value on nonannouncement days. 
Furthermore, the announcement effects are most pronounced on Fridays. 

Prices adjust very rapidly to the news releases. Analysis of tick data in the second 
paper shows that most of the reaction occurs within forty seconds, although there 
is higher volatility for another fifteen minutes or so. The report with the greatest 
impact on interest rates from 1988 to 1991 was the employment report (issued 
on Fridays), followed by reports on the Producer Price Index (PPI), the consumer 
price index, and durable goods orders. The employment report also had the most 
impact on exchange rates, followed by reports on the merchandise trade deficit, 
PPI, durable goods, GNP, and retail sales. 

Further US evidence is provided by Andersen and Bollerslev (19982) and Flem- 
ing and Remolona (1999). Macroeconomic news released in other countries also 
has a significant impact upon local volatility. Examples include Ito and Roley 
(1987) for Japan, Becker, Finnerty, and Kopecky (1993, 1995) for the UK, and 
Andersen and Bollerslev (19982) for Germany. Generally both US and domestic 
announcements are found to be important at non-US markets. Our final stylized 
fact 1s: 


5. There are short bursts of high volatility in intraday prices that follow major 
macroeconomic announcements. 


12.5 Intraday Volatility Patterns 
12.5.1 Estimation 


Intraday volatility patterns are often modeled by multiplicative factors, which 
may vary across the days of the week. We outline estimation methods when the 
periodic pattern is diurnal, so it repeats every day; these methods can easily be 
adapted for more general patterns that repeat once a week. We suppose the return 
r; for day t is the sum of N intraday returns r;,;, 1 < j < N. If the market has 
a closed period, then j = 1 represents that period and r; ; is the return from the 
close on day t — 1 to the open on day t. The latent volatility for day t is denoted 
by o; and we have 


N 
r=} rj and var(r; |o) = 97. (12.1) 
j=l 
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Intraday volatility factors À; are defined by supposing 
N 
var(r;;|o;) — Ajo] with Ak = ]. (12.2) 
j=l 


Thus Àj is the proportion of a trading day's return variance that is attributed to 
period j, here assuming that intraday returns are uncorrelated and that the factors 
are the same for all days t. When there is a closed market period, the proportions 
of open-market variance are defined by 


N 

À; 

L, j22 with «j=l. (12.3) 
j=2 


K= 
des BECH 
The above factors sum to one. Another convention is to make their average one, 
which is relevant when defining intraday ARCH models. Factors that average one 
are defined by 
A = NA. (12.4) 


Simple estimates of the variance proportions, when expected returns can be 
assumed to be zero, are given by 


2 ei ^ ym gë 
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following Taylor and Xu (1997). The above summations are over all days t in some 
set S. The set S might be all days or might be some subset, for example, all Fridays 
or all Fridays that have a macroeconomic news release. The simple estimates can 
be sensitive to outliers and it may be preferable to make the estimates a smooth 
function of the intraday time j. Andersen and Bollerslev (1997b) recommend 
flexible Fourier functions (FFFs) whose time-invariant specification is 


Àj (12.5) 


D 
Aj = exp (m Taaj t paf? * Y ailjea, 


i=l 
a Qn jk Qn jk 
cos | —— Ar sin | —— | |J. 12.6 
px ( N )+ k ( N )) (12.6) 


with D dummy variables for intervals that do not fit into a smooth pattern (per- 
haps because of news announcement effects) and with P sinusoidal functions. 
Their more general specification permits an interaction between o; and the peri- 
odic pattern. Smooth estimates A j are given by regressing À j on the explanatory 
variables in (12.6) or by the more sophisticated methods described by Andersen 
and Bollerslev (1997b, 1998a) and Martens, Chang, and Taylor (2002). 
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Figure 12.6. S&P 100 variance proportions. 


12.5.2 Equity Examples 


Figure 12.6 shows the estimated open-market variance proportions as percent- 
ages for the four months of spot S&P 100 five-minute returns considered in Sec- 
tion 12.4.4. Here the first period commences at 09:35 EST to avoid any unusual 
effects around the open at 09:30. The estimates & ; are shown by dots. The smooth 
curve &; shown by a continuous trace is obtained by regressing the «; on two 
polynomial terms and two sinusoidal functions. The intraday volatility pattern 
has a clear U-shape that can also be seen in Wood et al. (1985) and Andersen 
and Bollerslev (1997b). The autocorrelations of scaled intraday absolute returns, 
Irt,;|/K;, are free of periodic effects, as can be seen from the dotted curve in Fig- 
ure 12.5. The variance proportion from 16:00 until 09:35 EST on the next trading 
day is estimated as Au = 0.23 for the data used to produce Figures 12.5 and 12.6. 

A second equity index example is given by five-minute FTSE 100 futures 
returns from November 1993 to July 1998. Figure 12.7, reproduced from Areal 
and Taylor (2002), shows simple estimates € ; and smooth values &; for the open- 
market period from 08:35 until 16:10 local time. The intraday pattern has a high 
initial value when the futures market opens, a minor peak in the interval from 
09:30 to 09:35 when UK macro news is announced, a major peak at 13:30 when 
US macro news is released, and a generally higher level once US equity markets 
open at 14:30. Similar patterns are seen in the shorter UK datasets of ap Gwilym, 
Buckle, and Thomas (1999) and Tse (1999). 

Further insight into the periodic pattern can usually be obtained by estimating 
it for the five days of the trading week. Figure 12.8 shows that the UK pattern is 
similar through the week except at the open and at the time of US announcements. 
More than 746 of the intraday variance occurs in the first five minutes on Mondays, 
compared with 4% to 6% on the other days. The 13:30 announcement effect is 
much more pronounced on Fridays than on other days. As all Fridays are included 
in the calculations, the effect is even greater on announcement Fridays. There 
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Figure 12.7. Five-minute fitted open-market variance proportions for the FTSE 100 futures 
index, using all days of the week, for the period from 18 November 1993 to 17 July 1998. 
This figure and Figures 12.8 and 12.11-12.16 are taken from Journal of Futures Markets, 
N. M. P. C. Areal and S. J. Taylor, Copyright © (2002). Reprinted by permission of John 
Wiley & Sons, Inc. 


is also a minor volatility peak during the final minutes of trading before the 
weekend. The closed-market proportion is highest for the weekend period and 
equals 3846 for the three days from Friday 16:10 until Monday 16:10, compared 
with Àj — 3196 when all days are analyzed together. 


12.5.3 FX examples 


Volatility patterns for exchange rates are described by Miiller, Dacorogna, Olsen, 
Pictet, and Schwarz (1990), Dacorogna et al. (1993, 2001), and Andersen and 
Bollerslev (1997b, 1998a). 

We illustrate the typical DM/$ pattern by discussing Figure 12.9, which is 
reproduced from Taylor and Xu (1997). They use five-minute DM/$ returns to 
estimate hourly volatility factors from the year of O&A quotations, after deleting 
the standard 48-hour weekend period. Factors are found for each of the 1440 five- 
minute periods during a five-day week using equation (12.5). Sums of twelve 
consecutive five-minute factors define hourly variance factors that are scaled to 
average one. The derived standard deviation multipliers are shown in Figure 12.9. 
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Figure 12.8. Five-minute open-market variance proportions for the FTSE 100 futures 
index, by day of the week, for the period from 18 November 1993 to 17 July 1998. 


The clock used for the calculations is Eastern Standard Time, rather than GMT, 
because the periodic pattern changes when US clocks are reset in the spring and 
the autumn (Andersen and Bollerslev 19982). The first interval in Figure 12.9 is 
for the hour from 14:30 to 15:30 EST and the last is from 13:30 to 14:30 EST 
on the next day. The Friday symbol, for example, is used for the 24 hours from 
14:30 on Thursday until 14:30 on Friday. The reason for the unusual definition of 
a day is that the research study used days that ended when the FX options market 
closed in Philadelphia. 

DM/$ volatility is seen to be relatively high during the twelve hours when Euro- 
pean dealers are active (07:30 to 19:30 in London, 02:30 to 14:30 EST), with the 
highestlevels when both US and European dealers are active (07:30 to 13:30 EST). 
The peak in interval 19 shows the importance of US macro news, with Friday and 
then Thursday being the most important days. Half the Fridays in the sample had 
an announcement about a significant macroeconomic variable so the peak would 
be much higher if the nonannouncement Fridays were not used in the calculations. 
The local maximum in interval 13 occurs when trade accelerates in Europe around 
07:30 local time in London (08:30 in Frankfurt). There is also a spike on Mondays 
in interval 6, which is around the start of a new week in the Far East, while the 
lowest volatility levels were at lunchtime in Tokyo around intervals 9 and 10. 
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Figure 12.9. DM/$ intraday standard deviation multipliers. Reprinted from Journal of 
Empirical Finance, volume 4, S. J. Taylor and X. Xu, The incremental volatility information in 
one million foreign exchange quotations, pp. 317—340, Copyright © (1997), with permission 
from Elsevier. 


The autocorrelations of scaled intraday absolute returns, |r;, ;|/ 2 j» do not con- 
tain periodic effects for the O&A DM/$ data, shown by Andersen and Bollerslev 
(1997b) for five-minute returns and by the dotted curve in Figure 12.4 for thirty- 
minute returns, from Chang and Taylor (2003). 


12.5.4 6-time 


Another methodology for removing periodic effects from the volatility of intra- 
day returns redefines the timescale so that the average volatility is the same for 
all intraday intervals during a fixed period of time, typically one day or one 
week. This new timescale is called 0-time by Dacorogna et al. (1993, 2001). 
The 0-clock runs faster during the most volatile trading hours so returns are then 
calculated from shorter time intervals. Thus, for example, a day of five-minute 
returns on the 0-clock will contain 288 intraday returns calculated from irregu- 
larly spaced prices. There will be volatility persistence present in such returns 
but periodic volatility effects are eliminated from the autocorrelations of absolute 
returns (Dacorogna et al. 2001, Figures 7.5 and 7.6). 


12.6 Discrete-Time Intraday Volatility Models 


Intraday returns can be used to model and forecast intraday volatility by adapting 
the ARCH models of Chapters 9 and 10 and the stochastic volatility models 
of Chapter 11. We now discuss a few methods. Later we cover ways to use 
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high-frequency data to model and predict daily volatility (Sections 12.8, 12.9, 
and 15.7). 


12.6.1 Intraday GARCH 


Routine estimation of the GARCH(1, 1) model defined in Section 9.3 gives unsat- 
isfactory predictions because the periodic volatility effects are then ignored. One 
way to see this is to consider the half-lives of variance forecasts, defined and illus- 
trated in Section 9.4. When the GARCH(1, 1) model is estimated from K intraday 
returns per trading day, the half-life is given by Hx = log(0.5)/(K log(ax+8x)) 
trading days. These values will be similar as K varies when the model is correctly 
specified, i.e. 

(ax + Bk) = (o + Bi), (12.7) 


from the aggregation results of Drost and Nijman (1993). 

Andersen and Bollerslev (1997b) calculate half-lives for a year of DM/$ quota- 
tions and find they are highly irregular, decreasing from nine hours to one-and-a- 
half hours as the frequency of returns decreases from fifteen to ninety minutes and 
then increasing to levels around twelve days for four-hourly and lower frequen- 
cies. They also estimate wax + Bx > 1 for the five- and ten-minute frequencies. 
More satisfactory results are obtained by estimating the GARCH(1, 1)-MA(1) 
model from sums of scaled five-minute returns, 


Rij = ruujlÀj, (12.8) 


with 7; j a five-minute return, ¢ counting days, j counting five-minute periods 
within day t, and the scale factor 4; based upon (12.6). Their GARCH model for 
aggregated scaled returns measured over 5k minutes can be written as 


nk 
RE Rye be. LRP SK) (12.9) 
j=(n—1)k+1 


with K = 288/k and conditional variances given by 


h® = og +ax(e_,)? + Beh (12.10) 


t,n—1 t,n—l* 


The estimated persistence measures, o g + B  , then define half-lives that are fairly 
similar at frequencies of two hours or lower but they remain rather unsatisfactory 
at the highest frequencies. The measures are 0.971 for the five-minute frequency, 
0.917 for twenty minutes, 0.989 for two hours and 0.980 for eight hours, with 
respective half-lives of approximately two hours, three hours, five days, and eleven 
days. 

Taylor and Xu (1997) analyze the same DM/$ data and model hourly conditional 
variances using both five-minute and hourly returns. Now let r;,; denote a five- 
minute return with ¢ counting five-day weeks rather than days. Then their Table 3 
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includes results for the hourly returns described by the specification 
12n 
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j=l2(n—2)+1 
1£n«120. (12.12) 


Here the variance multipliers A7 average 1 and their square-roots were previously 
discussed as Figure 12.9. When j is zero or negative the pair of subscripts t, j 
refers to time period t — 1, n—j. For conditional normal distributions the parameter 
estimates include a = 0.0045, 6 = 0.9480, and y = 0.0319. Thus much more 
weight is given to the last hour of squared five-minute returns (through parameter 
y) than is given to the most recent one-hour return (by parameter o). Hypothesis 
tests accept a = 0 and reject y = 0 at low significance levels. Similar results 
are obtained for conditional GED distributions, with the tail-thickness parameter 
estimated as 1.15, and for specifications having variance multipliers that differ on 
announcement Fridays from those on the other Fridays. In all cases, the persistence 
a+ 6+ y is estimated to be between 0.984 and 0.985, giving a half-life of two 
days, which matches the results in Andersen and Bollerslev (1997b). 

A third way to estimate the GARCH(1, 1) model from intraday returns uses 
return intervals defined by the 6-clock described in the previous section. Dacor- 
ogna et al. (2001, Table 8.2) do this for seven years of DM/$ quotes. They 
emphasize that their persistence estimates are incompatible with the aggregation 
formula (12.7), which fails for frequencies higher than six hours. Their estimates 
of ax + Bx are 0.992 for the 0-time equivalent of ten-minute returns, only 0.968 
for thirty-minute returns, and 0.988 for hourly returns. 


12.6.2 HARCH 


The short half-lives of volatility shocks from high-frequency data certainly appear 
to contradict the longer half-lives estimated from daily data. They show that a 
GARCH(1, 1) model, with periodic volatility effects, oversimplifies the volatility 
dynamics. A more accurate volatility model will contain several components. 
These components may be defined by a variety of economic factors, such as 
macro news that has low persistence and technological variables that have high 
persistence, leading to a long memory volatility model if enough assumptions are 
made (Andersen and Bollerslev 19972). 

Alternatively, the components might also reflect the actions of traders whose 
decision horizons vary widely, from arbitrageurs who seek very rapid gains 
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to long-term investors who may only consider prices once a month. Müller, 
Dacorogna, Davé, Olsen, Pictet, and von Weizsäcker (1997) discuss the hetero- 
geneity of traders in detail and thereby motivate their heterogeneous ARCH model. 
Now suppose r; represents intraday returns measured in 0-time. Then the basic 
HARCH model specifies conditional variances by 


n J 2 
h, ann el El, (12.13) 
j=l i=l 

The square of the latest j-period return has weight c; in this model. The number 
n is taken to be large so that returns measured over very many horizons are able to 
have an impact upon future volatility. Müller et al. (1997) estimate the model for 
thirty-minute returns with n = 4096, which corresponds to twelve weeks. They 
reduce the number of free parameters to eight by making blocks of the c; equal, 
using cj41 = Cj42 = ++: = cq; for j = 1,4, 16,..., 1024. All the parameter 
estimates are highly significant for seven years of DM/$ quotes and the model 
has a much higher log-likelihood than GARCH(1, 1). 

Dacorogna, Miiller, Olsen, and Pictet (1998) modify the HARCH specification, 
reducing the number of j-period intervals from 4096 to 7 in their empirical work 
and thereby speeding up the estimation of the parameters. Their EMA-HARCH 
model aggregates volatility components defined by exponentially weighted mov- 
ing averages of squared j-period returns; thus, 


7 
hy = Cot X Chur, 
j=l 
kj 2 
hj, =ajhjyitd SI Yon), (12.14) 
i=1 


2 
a; = exp | ——— |. 
BUE 


and they choose kj = 1 and kj = 1 + 4/-? for j > 2. This model has a higher 
log-likelihood than its predecessor, for ten years of prices separated by thirty 
minutes of 0-time, for each of the DM/$, £/$, SF/$, yen/$, and DM/yen rates. 
The estimates of the parameters C; are strikingly similar across currencies. The 
seven estimates of k; C; equal 0.15 (j = 1), 0.19, 0.18, 0.05, 0.14, 0.11, and 0.11 
(j = 7) for the DM/$ series, all with standard errors near 0.01. 


12.6.3 Intraday SV 


There are few studies that have estimated a variant of the standard stochastic 
volatility model from high-frequency data. The standard SV model of Section 11.5 


represents log(o;) as an AR(1) process with mean g, innovation variance 65. and 
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autoregressive parameter d. Ghysels, Gourieroux, and Jasiak (1998) restate the 
SV model in operational time which is driven by market activity. Their timescale 
is dynamic, unlike 0-time, and its increments are functions of expected activity 
variables y;,..; and surprises in those variables, mu — Y; .j. They summarize 
activity by quotation counts, bid—ask spreads, and absolute returns. Their method 
essentially replaces o, @, and o; for period t in the calendar-time SV model by 
functions of exp(£1 9; .1 + B2(yr-1 — 3i-1)), for parameters f; and f5. Aso isa 
function of market activity, periodic volatility effects can appear in the calendar- 
time model which is estimated for the O&A dataset by the QMLE method of 
Harvey et al. (1994) using twenty-minute returns. Ghysels et al. conclude that 
absolute returns are probably the most satisfactory of the activity variables for the 
DM/$ data. 

A continuous-time SV model is estimated in Barndorff-Nielsen and Shephard 
(2001) from five-minute DM/$ returns. This model is described in Section 13.6, 
where it is noted that their four-component volatility model contains a dominant 
short-term component and three persistent components. 


12.7 "Trading Rules and Intraday Prices 


The apparent abundance of information offered by high-frequency price datasets 
may tempt us to hope that technical trading rules will then be more successful 
than when applied to daily prices. The principles that motivate the efficient market 
hypothesis are, of course, unaffected by the frequency of available data. Com- 
petition among traders will tend to eliminate profitable methods because traders 
generally share the same information. Furthermore, the prospect of more frequent 
trades from more frequent data is no advantage when transaction costs must be 
paid. 

Central bank interventions were noted in Section 7.10 that may permit profitable 
currency trading. It would then be potentially valuable to review prices more often 
than daily, particularly if some traders learn about interventions before others 
(Peiers 1997; Chang and Taylor 1998; Frenkel, Pierdzioch, and Stadtmann 2001). 


12.7.1 Methods 


Standard moving-average trading rules for daily data can also be applied to more 
frequent observations, as discussed and investigated by Acar and Lequeux (1999). 
It is not surprising, however, that more complicated methods have been tried that 
seek to extract patterns from the large price histories available. 

Nearest neighbor methods produce forecasts by seeking past periods that have 
returns similar to those recently observed. These methods essentially provide 
nonparametric predictors. Although they are often motivated by chaos theory, the 
predictors do not require chaotic dynamics to be successful. The predictors should 
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be as successful as the best alternatives whenever the price process is stationary 
and there is a long price history. 

Neural network methods provide price forecasts from within a general family 
of functions that can approximate any nonlinear function to a stated degree of 
accuracy. The approximating function may have many parameters that can easily 
be estimated from a large dataset. Alexander (2001) describes both neural network 
and nearest neighbor methods. 

Genetic algorithms develop trading rules by optimization during a learning 
period. Flexible decision rules can evolve through time by discarding the least 
successful rules and permitting random variation within the retained rules. For 
further details see Neely et al. (1997) and Allen and Karjalainen (1999). These 
methods can be expected to thrive on large datasets if they have any potential. 


12.7.2 Results 


All the methodological issues emphasized in Section 7.7 for trading rules applied 
to daily prices are relevant for high-frequency data. It is essential that research 
studies reserve some data that are not used to design rules and optimize their 
parameters. Also note that the assumption that we can trade at recorded prices 
without changing the price path must become less realistic as trades become more 
frequent. 

There is not much evidence from high-frequency trading rules against the effi- 
cient market hypothesis in the research literature. Acar and Lequeux (1999) find 
that none of their moving-average style trading rules produces significant prof- 
its, after transaction costs, when applied to the O&A dataset described in Sec- 
tion 12.2. Even traders paying marginal transaction costs appear unable to earn 
excess returns. Toulson, Toulson, and Sinclair (1999) apply neural networks and 
wavelet transform methods to two years of tick data for futures trading of five 
assets at LIFFE. They claim returns “modestly in excess" of those from buy and 
hold strategies. Alexandré, Girerd-Potin, and Taramasco (1998) evaluate nearest 
neighbor forecasts for two exchange rate series with disappointing results. Like- 
wise, Dunis, Gavridis, Harris, Leong, and Nacaskul (1998) explore the potential 
of genetic algorithms based upon momentum and relative strength indicators for 
two FX series. They discover that in-sample optimized profits do not recur in their 
out-of-sample period. Neely and Weller (2003) evaluate the performance of an 
autoregressive forecasting method and a genetic program. Their FX results are 
consistent with an efficient market after realistic transaction costs are taken into 
account. 

Dacorogna et al. (2001) present the case for profitable foreign exchange trad- 
ing based upon the extensive research of Olsen & Associates. Perhaps we should 
expect any profitable opportunities to be exploited by O&A, given their consider- 
able investment in data acquisition and research. Dacorogna et al. (2001, p. 296) 
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tell us: “The purpose of this chapter is not to provide ready-to-use trading strate- 
gies, but to give a description of the main ingredients needed in order for any 
real-time trading model to be usable for actual trading on financial markets.” And 
then: “Our models anticipate price movements in the foreign exchange market 
sufficiently well to be profitable for many years yet with acceptable risk behavior, 
and they have been used by many banks.” The ingredients of their trading method- 
ology include genetic algorithms and strict optimization and testing procedures. 

Chapter 11 of their book includes plenty of detail about the methodology of 
genetic algorithms and their literature. Among their results are out-of-sample 
annual returns of 3-646, using hourly data from 1987 to 1995 (their Table 11.2 
and Chopard, Pictet, and Tomassini 2000), 11-13% for the period 1986-1993 
(Table 11.3), and 5% per annum from 1993 to 1997 (Table 11.12). As far as I can 
tell, these returns are approximately in excess of the domestic risk-free rate. The 
critical figure, however, is the excess return on capital invested with O&A in more 
recent years. Some related research is described in Gengay, Ballochi, Dacorogna, 
Olsen, and Pictet (2002) for simpler trading rules that have also been profitable 
net of transaction costs. 


12.8 Realized Volatility: Theoretical Results 


12.8.1 Realized Volatility 


Volatility during a period of time can be estimated more and more precisely as 
the frequency of returns increases, providing intraperiod returns are uncorrelated 
and certain other conditions apply. We suppose the periods are trading days with 
daily returns r; that are the sum of N intraday returns r;, j, a. thus, 


N 
n= Sorin: (12.15) 
j=l 
For N = 1, 2,3,... we define the realized variance for day t as 
N 
Crp = re (12.16) 
j=l 


and we refer to ô+ y as the realized volatility. This and related measures of volatil- 
ity appear in several high-frequency studies, early examples including Schwert 
(1990b), Hsieh (1991), Zhou (1996), and Taylor and Xu (1997). The quantity ô? N 
is simply N times the sample variance of N intraday returns (assuming a zero 
mean) and hence it is a natural estimate of daily variance. 

Figure 12.10 illustrates annualized values of 6; y, for the single day of FTSE 
transaction data described in Section 12.3. A logarithmic scale is used for N. An 
estimate of ô; y is marked by a dot for all values of N that are factors of 540 
(between 3 and 540 inclusive), for the nine-hour period commencing at 08:15. 
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Figure 12.10. Realized volatilities for one day. 


Andersen and Bollerslev (1998b), Barndorff-Nielsen and Shephard (2001), and 
Comte and Renault (1998) concurrently and independently showed that random 
variables 8? y converge to a limit a, as N — ox, that represents the squared 
volatility for period t, when various assumptions are made. Subsequently, Ander- 
sen, Bollerslev, Diebold, and their co-authors used sample values of ô? y>» for 
large N, to infer interesting results about the distributional and autocorrelation 
properties of oF. We now provide some theoretical intuition and results for their 
empirical methods, including results about the accuracy of the estimator ô? vA 
discussion of the empirical evidence then follows in the next section. Expected 
returns are assumed to be zero in the theory that follows and proofs are given in 
the appendix to this chapter. For a mathematical survey of the theory of volatility 
measurement, see Andersen, Bollerslev, and Diebold (2005). 


12.8.2 A Simple Example 


First, consider the simplest situation when o; is a number that represents the latent 
volatility for day t and the intraday returns are conditionally Gaussian and 1.i.d. 
for all N: 


rijn |o ~ iid. N(0,07/N), 1<j<N. (12.17) 


Then r; | o; NO, SCH Here the number o; is any possible outcome of a random 
volatility variable, while the terms rz, ; y are random variables. 

Each ofthe N random quantities N r? LN provides an independent and unbiased 
estimate of de. The variance of their average is proportional to 1 / N, so this average 
will converge to o? as N increases. We have 


E[é?y | o] = of (12.18) 
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and 


42 20? 
var (fy | or) = "NC (12.19) 


Consequently, as N — oo the realized variance ô? 
latent volatility o7. 

If, instead, we let o; represent a random variable, with r;,j.N = Otur, j, N /VN 
and o; independent of u;, zw ~ i.i.d. N (0, 1), then 


y converges to the squared 


E[62 y — 02] =0 (12.20) 
and d 
2E 
var(62y — 02) = t ! (12.21) 


Again ô? N^ o? — 0 as N — oo, with the convergence being in mean square 
and hence in probability. 


12.8.3 An Example with Periodic Effects 


A more realistic theoretical framework includes an intraday periodic volatility 
pattern. For some number o; and for each N, now assume 


N 
rjn |o ~ NO, Ajno), Ajw 20, and J Ausl (1222 
j=1 


with row | o: independent of rw | o; whenever j Æ k. The realized variance 
is again unbiased and now 


N 
var(62y | or) = 207 3 a3 y. (12.23) 
j=l 


This again decreases to zero, as N — oo, providing the multipliers A: y diminish 
at a sufficiently rapid rate. A necessary and sufficient condition is that 

AN = iet AUN — 0 asN — oo. (12.24) 
Thus, when the maximum of the multipliers converges to zero as N — oo, the 
realized variance ô? y converges to o7. 

There are, however, reasonable conditions for which this convergence result 
fails. For example, if j = 1 corresponds to an overnight period when the mar- 
ket is closed then A1 wv is the same for all N and var (6; n | Or) 2 H nof >0 
when A1,y > 0. Likewise, if there is scheduled news that produces an instant- 
aneous jump in the price logarithm, conditionally distributed as N (0, Àa?); then 
var(62, | or) > 2470/4 for all N. 
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The realized volatility is not the most accurate estimator of o? when the periodic 
pattern is known for each day t. A more general unbiased estimator is 


N N 
Gey => wj r?;y with Y wj NAj;N = 1. (12.25) 
j=l j=l 
Its conditional variance is 
N 
var(G;y | o) = 2o; KS Ww; AG ne (12.26) 
j=l 
which is minimized when w;, v = 1/(NAj,w) (Areal and Taylor 2002) and then 
-2 Got 
var (o/ y | Or) = EG (12.27) 


This is the same variance as in (12.19). Thus the optimally weighted estimator 
converges to o? as N increases. 


12.8.4 A General SV Result 


Our most general result supposes that intraday returns follow a general stochas- 
tic volatility process for all N. Suppose r;,j,v = 0;,j, NU j NN N, UnjN ~ 
N (0, 1), and u;,;,w is independent of o;,;, v and all variables uv and o N 
that are determined before time t, j. We assume the average squared volatility is 
the same for all N; thus, 


N 

1 

" S af Eo (12.28) 
j=l 


Then daily returns have the SV factorization r; = orur with o; independent of 
ut ~ NO, 1). In the special context of a continuous-time diffusion model for 
prices, o? equals both the integrated variance defined by dg o? (s) ds and the 
quadratic variation (QV) of the logarithms of prices during day t. 

We now have 


N 
A 2 f1 
var(ó2 — o2) = SE Kä 2 (12.29) 
j=l 


This variance tends to zero as N increases providing the day’s volatility is not 
concentrated around any particular time. A sufficient condition is 


2 2 
maz (on / Zok) — 0 as N — oo. 


J 


Another sufficient condition is a finite limit for N^! > j ol, y. Assuming one of 


nj 
these conditions is met, ô? y > o2. 
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This conclusion holds for continuous-time models of prices that are diffusion 
processes, but not when there are jumps in the price process. The limit of the 
realized variance for a pure jump process is its quadratic variation, which equals 
the sum of squared jumps, but this is not a function of a latent volatility variable. 
Further theoretical discussion can be found in Andersen, Bollerslev, Diebold, and 
Labys (2001), Barndorff-Nielsen and Shephard (2002a), Andersen, Bollerslev, 
and Diebold (2003), and the appendix to this chapter. 


12.8.5 Measurement Error 


The convergence of ô? y t0 a limit o? is a theoretical ideal. Trading is not con- 
tinuous and microstructure issues such as bid—ask spreads and price discreteness 
ensure that realized volatility always contains some measurement error. 

From (12.19), a naive estimate of the standard deviation of the measurement 
error ô? y — Op is /2/N ô? y- A typical high value of N is 288, for five-minute 
returns with no market closures, and then the naive standard error equals one- 
twelfth of the estimate ô? y- Periodic intraday volatility effects increase the stan- 
dard error, as the variance given by (12.23) exceeds that given by (12.19). Any 
excess conditional kurtosis in intraday returns will also increase standard errors. 

Barndorff-Nielsen and Shephard [BNS] (2002a) and Meddahi (2002) empha- 
size and illustrate the magnitude of measurement errors. BNS (2002a,b) provide 
asymptotic convergence results to the standard normal distribution for a class of 
general, continuous-time stochastic volatility models. Their results can be used 
to estimate standard errors from 


ô? y zd o? D 
— D. N(0, D (12.30) 
(3 2595] A RAMS 
and e > 
log(ô4 ~) — log(a7) 
PON E D, NO, 1). (12.31) 


(3 2053 "aoo EE 
The second of these results provides the better approximation for practical values 
of N. The estimated standard error of log(6? y) is approximately /2ky/@GN) 
with ky the sample kurtosis of the N intraday returns r; j. y. 

Figure 12.10 shows two sets of 95% confidence intervals for one day of esti- 
mates ô? y: One set is obtained from the naive standard errors and critical points of 
the X5 distribution, while the other set is given by (12.31). The intervals are gen- 
erally wider for the more accurate standard errors given by (12.31) and they are 
shown by the longer dashed lines. 

The above standard errors ignore microstructure noise and hence they will 
underestimate the variability of measurement errors. Microstructure effects will 


also introduce bias into the estimates ô? y» Which becomes more severe as N 
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increases. Bandi and Russell (2004a,b) show how the impact of microstructure 
noise can be eliminated and the optimal sampling frequency can be derived. Their 
average estimate of the optimal frequency for stock volatility calculations suggests 
that returns should be calculated every four minutes. Ait-Sahalia, Mykland, and 
Zhang (2005) provide further results about the impact of microstructure noise 
when volatility is constant. 


12.8.6 Bipower Estimates 


An alternative estimate of o? is provided by the realized bipower, which can be 
defined by 


Alz 


N 
KEE 
ja 


The realized variance and the realized bipower converge to the same limit when the 
price process is continuous and has martingale properties. The limits are different, 
however, when there are jumps in the price process. Comparisons between sample 
variance and bipower estimates can then be used to make inferences about the jump 
component (Barndorff-Nielsen and Shephard 2004a,b; Andersen, Bollerslev, and 
Diebold 2003; Huang and Tauchen 2004). 


12.9 Realized Volatility: Empirical Results 


We now review a few empirical studies of realized variance (RV), in which the 
researchers select a specific number of intraday periods N and then calculate RV 
for day t from intraday returns rz, j as 


N 
oS a (12.32) 


Their intention is to obtain an accurate estimate G; of the latent volatility for day 
t, denoted by o;. The intraday returns could be mean-adjusted in (12.32), but the 
impact of such adjustments is negligible for large values of N. We summarize 
results selected from Andersen, Bollerslev, Diebold, and Labys (2000, 2001), 
Andersen, Bollerslev, Diebold, and Ebens (2001), Ebens (1999), and Areal and 
Taylor (2002), and refer to their work as ABDL, ABDE, E, and AT. There are 
three major conclusions, that are obtained in all four studies: 


* the distribution of standardized returns r; /6; is almost normal; 
* the distribution of log(6;) is approximately normal; 


* realized volatility has a long memory feature, which can be modeled by a 
fractionally integrated process. 
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12.9.1 Data 


All four studies make use of five-minute returns and all exclude weekends and 
holidays. ABDL use DM/$ and yen/$ quotes for the ten years from December 
1986 to November 1996. Quotes are used for all twenty-four hours of the day and 
then N — 288. However, they calculate RV from thirty-minute returns in their 
first paper. ABDE study the transaction prices for all thirty stocks in the DJIA 
from January 1993 until May 1998. Their transaction data come from records 
that extend from 09:30 EST to 16:05 EST, thus they have 79 five-minute returns 
per day. The returns when the market is closed are ignored, so that RV then 
measures the open-market variance, which is less than total daily variance. Both 
ABDL and ABDE also report interesting research into realized covariances and 
correlations. Ebens investigates the DJIA index for the same period that ABDE 
study the component stocks. It appears that he studies the open-market RV and 
presumably N — 79 again. 

Areal and Taylor obtain returns from transaction prices for FTSE 100 futures, 
primarily for the period from March 1990 to July 1998 when trading was from 
08:35 to 16:10 London time. They use the overnight return as well as the 91 intra- 
day returns when calculating the total RV. They also calculate RV as a weighted 
sum of squared returns, as shown by equation (12.25). This makes the open-market 
RV slightly more accurate and the total RV much more accurate. Figure 12.11 
shows the annualized total RV as a standard deviation, namely V2516;, for a 
longer period and plotted on a logarithmic scale. The exceptional value of 365% 
occurs on the day following the US crash on 19 October 1987. 

It is desirable that RV is an unbiased estimate of the latent volatility. Bias will 
occur when the intraday returns are autocorrelated and can be substantial, as noted 
by Blair et al. (2001b) for the spot S&P 100 index. ABDE use the residuals from 
an MA(1) model to avoid systematic bias induced by bid—ask bounce effects. To 
check for bias, the expectation of o? can be estimated by the variance of daily 
returns and compared with the sample mean of estimates 62. AT find that the daily 
variance is 92% of the average RV when the closed-market returns are included 
in the calculations. The corresponding ratios for the DM/$ and the yen/$ rates are 
0.95 and 0.92 from numbers tabulated by ABDL. 


12.9.2 The Distribution of Standardized Returns 


First we consider the distribution of daily returns standardized by their estimated 

volatility, 

rt — D 
o 


a= , (12.33) 


with £i an estimate of expected returns, which is set to zero in some studies. The 
mean and standard deviation of the standardized returns should be respectively 
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Figure 12.11. Annualized FTSE 100 volatility calculated each 
day from optimally weighted squared five-minute returns. 


near to zero and one when the measurement error in 62 is small. This occurs in 
ABDL, E, and AT but ABDE have standard deviations below one for all thirty 
stocks, with the median value being 0.81. 

A striking result in all these studies is the distribution of the standardized returns 
Zr. Itis approximately normal. This can be seen from density plots in all the studies 
and in Figure 12.12 for the FTSE data of AT. This shows a histogram for the z;, a 
normal density (which matches the mean and variance of the z;) as a dotted curve, 
and the kernel density estimate as a solid curve (defined by (4.6) and a bandwidth 
of 0.25). 

The skewness of the z; is approximately zero, with estimates of 0.02 and 0.00 
for the two currencies in ABDL, 0.03 in E, 0.16 in AT, and a median stock value 
of 0.11 in ABDE. The kurtosis of the z; is near the normal value of three and the 
estimates are 2.41 and 2.41 in ABDL, 2.75 in E, 2.77 in AT, and a median of 3.13 
in ABDE. These kurtosis estimates are, of course, less than for the returns r; and 
they are also less than for returns that are standardized by conditional standard 
deviations from ARCH or SV models. 

The high-frequency data allow us to get closer to the latent volatility variable 
than methods that only use daily data and this is revealed in the closer approxi- 
mation of the standardized returns to a normal distribution. The approximation is 


12.9. Realized Volatility: Empirical Results 335 


E 
bo 


e nee EC 


> 
LA 


Relative frequency, density 


e 
= 


LHH L 
0 2 4 


Figure 12.12. The distribution of the daily standardized returns of 
FTSE 100 index futures from March 1990 to July 1998. 


good but not perfect, reflecting some measurement error and also the discreteness 
of price changes. It does appear reasonable to model daily returns as a mixture 
of normal distributions, mixing across different values of the time-varying latent 
volatility o;. The unconditional distribution of returns then follows from the dis- 
tribution of the mixing variable o;, which we now consider. 


12.9.3 The Distribution of Realized Volatility 


The empirical distribution of ô? is strongly skewed to the right and has very high 
kurtosis, which reflects notable high outliers. The kurtosis estimates are 24 and 
67 in ABDL and the median is 66 in ABDE. The distribution of the standard 
deviation estimate ô, is less skewed and less kurtotic, but the kurtosis estimates 
remain far above three: 7.8 and 10.4 in ABDL, 17 in E, and 19 in AT. 

As popular stochastic volatility models assume log(o;) has a normal distribu- 
tion (as in Section 11.5), it is interesting to consider the empirical distribution 
of log(6;). These empirical distributions are approximately normal. As log(9;) 
equals the latent term log(o;) plus a relatively small error, the distribution of 
log(o;) is also near to normal. This implies that the distribution of returns is 
approximately the lognormal-normal distribution advocated by Clark (1973), 
Tauchen and Pitts (1983), and Taylor (1986). 

Density plots in all four studies support an approximate normal approximation 
for log(G;). Figure 12.13 shows a histogram for the FTSE data of AT and the 
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Figure 12.13. The distribution of the logarithm of realized 
volatility for the FTSE 100 index from 1990 to 1998. 


matching normal density as a dotted curve. The kernel density estimate, with 
bandwidth 0.1, is shown as a solid curve. The skewness of log(ô+) is generally 
positive: 0.35 and 0.26 in ABDL, a median of 0.19 in ABDE, 0.75 in E, and 0.44 
in AT. These values are significantly different from zero at low significance levels. 
Likewise, the kurtosis estimates for log(6;) are significantly above three: 3.26 and 
3.53 in ABDL, a median of 3.89 in ABDE, 3.78 in E, and 3.71 in AT. 

Thus log(6;) only has an approximate normal distribution, which is the most 
that can be expected as there is no theoretical reason why volatility should have 
a lognormal distribution. There are several possible explanations for the excess 
kurtosis in the distribution of log(6;), including measurement error and extreme 
high outliers. It is possible that occasional crises create excess probability in the 
right tail relative to the normal distribution. AT note that the annualized values for 
the FTSE index from 1990 to 1998 have a maximum value of 4/2516, equal to 
81% on 28 October 1997 and that Ebens reports exceptional values for the DJIA 
index on both the 27th and the 28th. Also, three of the highest FTSE estimates 
are for the day that sterling left the European Exchange Rate Mechanism and the 
two following days, with all three annualized estimates above 50%. 

The various studies also provide estimates of the mean o and standard deviation 
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P of log(6;). These can be compared with the typical estimates of œ and f for 
log(o;) in the standard SV model that are mentioned in Section 11.6. The high- 
frequency estimates of o are —5.05 for currencies in ABDL, —4.71 for the equity 
index in AT, and —4.13 for the median stock in ABDE. The estimates of £ include 
the impact of measurement errors and are 0.35 for currencies, 0.33 for the index, 
and 0.26 for the median stock. The FX estimates are comparable to the low- 
frequency FX estimates given in Sections 11.6 and 11.7. 


12.9.4 The Autocorrelations of Realized Volatility 


We should expect to find substantial positive autocorrelation in realized volatility, 
because RV is a fairly accurate estimate of volatility that is known to be highly 
persistent from our consideration of ARCH and SV models. We should also expect 
the positive dependence in RV to exceed that of squared daily returns because 
the latter quantity is a much more noisy estimate of volatility than is RV. These 
expectations are confirmed by the empirical evidence. All four studies estimate 
the first autocorrelation of log(6;) to be between 0.60 and 0.65 and they all find 
that the first 100 autocorrelations are all positive. 

It is important to remember that measurement error reduces autocorrelations. 
The autocorrelations of log(G;) and log(o;) are proportional to each other for 
stationary processes; thus, 


cor(log(6;), log(6;4r)) = Scor(log(o;), logo.) v1, (12.34) 


with 
ô = var(log(o;))/ var (log(G;)) < 1, (12.35) 


assuming the terms log (6G; /o;) are i.i.d. Hence the correlation between consecutive 
values of log(o;) is estimated to be more than 0.6. Values exceeding 0.7 appear 
to be plausible. 

Figure 12.14 shows the autocorrelations of log(6;) obtained by AT at lags 1— 
250. They decay slowly, as previously emphasized by ABDL and ABDE, and 
suggest we should consider a long memory model for volatility. A clear minor 
calendar effect can be seen in Figure 12.14, with the local maxima at lags 5, 
10, 15, and 20 indicating variation in equity volatility across the five days of the 
week. Taylor and Xu (1997) show that average realized DM/$ volatility increases 
from Monday to Friday, presumably reflecting the timing of macroeconomic 
announcements. 


12.9.5 Fractional Integration of Realized Volatility 


A fractionally integrated process can explain the slow decay in the autocorrela- 
tions of realized volatility. The degree of fractional integration, denoted by d, 
is then between zero and one. We have already encountered the parameter d 


338 12. High-Frequency Data and Models 


Autocorrelation 


0.2 b- sl — —L— 
0 50 100 150 200 250 
Lag 


Figure 12.14.  Autocorrelations of the logarithm of realized 
volatility for the FTSE 100 index from 1990 to 1998. 


in Sections 10.3 and 11.10, where evidence of long memory volatility effects 
in low-frequency returns was noted. The evidence is particularly impressive for 
high-frequency returns, both in the four studies of realized volatility and also in 
the related work of Andersen and Bollerslev (19972) and Bollerslev and Wright 
(2000). 

The fractional parameter d appears in the asymptotic shapes of autocorrelations, 
spectral densities, and variance ratios, as explained in Section 3.8. The variance Sr 
of the sum of T consecutive observations has the scaling law T~@4+) Sr — c as 
T — œ fora positive constant c. Thus log(S7) is asymptotically a linear function 
of log(T). The empirical data rather remarkably support a linear relationship for 
small values of T. Figure 12.15 shows this for the data of AT over the range 
1 € T < 128, from which d is estimated to be 0.42. Likewise, ABDL estimate d 
as 0.39 and 0.36 and ABDE have a median estimate of 0.39, from regressions over 
1 < T < 30. These estimates are for the logarithms of realized volatility, log(6;). 
Measurement error has no effect on the asymptotic result, so the estimates can 
be used for log(o;). The estimates are also applicable to o; and Gr, from theory 
in Andersen and Bollerslev (1997a). The regressions do not, however, provide 
standard errors for d. 


12.9. Realized Volatility: Empirical Results 339 


In(T) 


Figure 12.15. Scaling plot for daily logarithms of realized 
volatility, Sr is the variance of T consecutive observations. 


The estimate of d devised by Geweke and Porter-Hudak (1983) relies on the 
unbounded shape of the spectral density f (w) for low frequencies w. The theoret- 
ical density is proportional to c) ?7 as œ — 0 and then œ f (w) — C for some 
positive C. The GPH estimate is obtained from n observations and m < n val- 
ues of the sample periodogram 7 (cj) that estimate f(@;) at the frequencies 
Qj = 2nj/n, j = 1,2,..., m. The least squares regression of log(J(@;)) on 
cj produces a slope estimate B, from which d is estimated by å = — B /2. The 
distribution of d is approximately normal with mean d and standard deviation 
equal to 2/./24m, providing m is small relative to n. The standard error of d 
can also be estimated from the standard OLS value for B. The selection of m is 
problematic and involves a trade-off between bias and variation in d. Bias is a 
problem if m is too large, while d is inaccurate if m is too small. Most researchers 
set m — n? for a power 0 between 0.5 and 0.8. For further details of the theory 
of the GPH test, see Baillie (1996) and Bollerslev and Wright (2000). 

An additive measurement error does not change the shape of the spectral density 
at low frequencies. Therefore the GPH estimates of d obtained from log(6;) are 
also applicable to log(o;), o;, and a ABDL graphed d against m = n°, looking 
for a region in which d is not sensitive to the choice, as suggested by Taqqu and 
Teverovsky (1996). ABDL estimate d to be 0.42 and 0.45 when 6 = 0.8, with 
standard errors below 0.03. ABDE prefer 6 = 0.6 and obtain a median estimate 
of d equal to 0.35, with all their estimates having a standard error of 0.07. Ebens 
and AT both use 6 = 0.8 to respectively obtain equity index estimates of 0.40 
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Figure 12.16. GPH estimates of the degree of fractional integration, d, as 
a function of the number of periodogram ordinates, n? , used in their calculations. 


(s.e. 0.04) and 0.43 (s.e. 0.03). It is notable that these estimates are all in the range 
from 0.35 to 0.45. They are also close to the unsophisticated estimates given by the 
scaling law methodology. All the GPH estimates comprehensively reject d = 0, 
if we can rely on an asymptotic hypothesis test applied when m is a significant 
proportion of n. Figure 12.16 shows the estimates of d (and 95% confidence 
intervals) for the AT data, as m increases from 40 to 500, with n = 2075. 
The filtered series ` 
yı = (1 — L)? og(ó;) (12.36) 


is examined by ABDE and E to see if realized equity volatility responds asym- 
metrically to price rises and falls. They find that the sign of the return for day 
t — | provides information about the expected value of y;, which is higher after a 
price fall than a price rise. AT observe that their values of y; appear to be almost 
white noise. 


12.9.6 Short or Long Memory? 


The empirical evidence discussed here for RV shows that fractionally integrated 
processes are plausible for volatility, supporting similar empirical evidence for 
ARCH and SV models noted in Sections 10.3 and 11.10. Granger (1980) proved 
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that long memory processes can arise when short memory processes are aggre- 
gated. Specifically, when AR(1) components are aggregated and the AR(1) param- 
eters are drawn from a beta distribution then the aggregated process converges 
to a long memory process as the number of components increases. Andersen and 
Bollerslev (19972) develop the theoretical results in more detail when the context 
is aggregating volatility components. It is certainly credible to assert that volatil- 
ity reflects several sources of news, that the persistence of shocks depends on the 
sources, and hence that total volatility has a long memory property. 

There are, however, alternatives to the long memory conclusion. Gallant, Hsu, 
and Tauchen (1999) estimate a volatility process for daily IBM returns that is the 
sum of only two short memory components yet the sum is able to mimic long 
memory. They also show that the sum of a particular pair of AR(1) processes has 
a spectral density very close to that of fractionally integrated white noise with 
d = 0.4 for frequencies o > 0.017. 

A sum of two AR(1) components is preferred for volatility in Alizadeh, Brandt, 
and Diebold (2002). Barndorff-Nielsen and Shephard (2001, 2002b) analyze ten 
years of DM/$ five-minute returns, adjusted for intraday volatility periodicity, and 
show that the sum of four short memory processes provides an excellent match 
to the autocorrelations of squared five-minute returns, which appear to display 
the long memory property of hyperbolic decay. Pong, Shackleton, Taylor, and Xu 
(2004) show that a sum of two AR(1) components forecasts currency volatility as 
accurately as a long memory model. 

Occasional structural breaks, from one short memory process to another, are 
a second alternative to the long memory conclusion. Diebold and Inoue (2001) 
illustrate how regime switches can mimic long memory properties. Andreou and 
Ghysels (2002) show how to identify the number and location of multiple breaks in 
the volatility process. Granger and Hyung (2004) demonstrate that it is difficult 
to distinguish between break and long memory models, using simulations and 
analysis of absolute returns. 

Ohanissian, Russell, and Tsay (2004) develop and investigate a test of the null 
hypothesis that a process has a long memory. Their test compares estimates of 
the long memory parameter d for a set of data frequencies; it accepts the null 
hypothesis for ten years of five-minute FX returns at the 5% significance level. 


12.9.7 Applications 


The literature on realized volatility (RV) is growing rapidly and many interesting 
applications can be expected in the future, in addition to the few noted here. Maheu 
and McCurdy (2002) estimate time-series models for FX volatility from RV that 
incorporate semi-Markov switching between regimes. Bollerslev and Zhou (2002) 
use RV to estimate a continuous-time model for FX rates that contains two short 
memory volatility components and a jump component; more details of the price 
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process are given in Section 13.6. Fleming, Kirby, and Ostdiek (2003) show 
that RV has economic value, as better portfolios can be constructed when the 
covariance matrix of asset returns is estimated from RV instead of from daily 
returns. They show that higher expected returns can be obtained, for the same 
level of portfolio variance, when RV is used to select a portfolio invested in cash, 
a stock index, Treasury bonds, and gold. The accurate measurement of volatility 
by RV can be exploited when forecasting volatility (see Blair et al. 2001b; Maheu 
and McCurdy 2002; ABDL 2003), and the specific forecasting results described 
in Section 15.7. 


12.10 Price Discovery 


High-frequency data are ideal for making inferences about how information is 
reflected by prices. As information is reflected rapidly, much more can be learnt 
from high-frequency records. We have already noted the impact of scheduled 
macroeconomic news in Section 12.4. 

Some news is not revealed publicly and has to be inferred from prices. Currency 
interventions by central banks are often not announced. Peiers (1997) finds that 
Bundesbank interventions are associated with Deutsche Bank making informed 
quotations, which precede quotations by other banks, up to one hour before 
Reuters announces the interventions. Chang and Taylor (1998) estimate ARCH 
models that incorporate dummy variables for interventions by the Bank of Japan. 
They infer that volatility increases thirty or more minutes before Reuters reports 
the interventions. Dominguez (2003) finds that some traders typically know that 
the Fed is intervening at least one hour before the public release of this information 
in newswire reports. 

Several research studies have compared the relative impact of information upon 
two markets. Many assets are simultaneously traded in different places. Compar- 
isons of prices for the same assets at different US equity markets have been made 
by Hasbrouck (1995, 2003), first for individual securities and more recently for 
equity indices. Two examples of price discovery research for European stocks 
that are simultaneously traded at domestic and US markets are Hupperets and 
Menkveld (2002) and Grammig, Melvin, and Schlag (2004). A few years ago, 
Bund futures were traded both on the floor of LIFFE and electronically by the 
Deutsche Terminbourse and then the relative price discovery of these two mar- 
kets depended on the level of volatility (Martens 1998). Nikkei futures have been 
traded in Japan and Singapore, providing an opportunity for traders in Osaka to 
learn about prices from the foreign market when limit rules close the domestic 
market (Martens and Steenbeek 2001). 

Spot and futures prices for the same asset have been compared to see which mar- 
ket first reveals information and to measure how long it takes for the slower market 
to catch up. Some representative contributions to this literature are Kawaller et 
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al. (1987), Stoll and Whaley (1990), Yadav and Pope (1990), De Jong and Nij- 
man (1997), and Taylor, van Dijk, Franses, and Lucas (2000). It has often been 
concluded that equity futures prices lead spot index levels and hence that price 
discovery occurs first in the futures market. Another strand of research compares 
the impact of the same information on different assets, for example, on the prices 
of US and UK stocks as in Kofman and Martens (1997). 


12.11 Durations 


The times that elapse between events such as trades or price changes can be used to 
predict the times of future events and to explore microstructure theories. Engle and 
Russell (1997, 1998) define autoregressive conditional duration (ACD) models 
for durations that are analogous to ARCH models for returns. Let x; be the time 
duration between events i — 1 and i. Then an ACD model specifies conditional 
expectations for the times, 


y; = Elx; | xi-1, Xi-2; ...]. (12.37) 


and a probability density function for the scaled times x; /wv;. The scaled times 
are assumed to be 1.1.d., so that 


Xj = Wi Sj (12.38) 


with the random variables e; being 1.i.d. with unit mean. A simple example is 
given by the WACD(1, 1) model, which has 


V; = e t axi-i-d Yi- (12.39) 


and a Weibull distribution for the ¢;, whose density is determined by a positive 
parameter y. The conditional densities of the times are then 


(ure + Ey 

Vi 
The special case y — 1 defines conditional exponential distributions, for which 
f Gi | Wi) = exp(—xi/vj)/ Yi. Also, the variables ei have exponential distri- 
butions with a mean that depends on y. Much of the theory of GARCH(I, 1) 
models can be adapted for WACD(1, 1) models. In particular, the parameters can 
be estimated by maximizing the product of the conditional densities and standard 
errors can be obtained from the logarithm of this likelihood function (Engle and 
Russell 1998). 

Duration times have intraday periodic patterns that can be incorporated into the 
conditional expectations der. Engle and Russell (1997) use multiplicative adjust- 
ment factors for the O&A dataset. They work with a subset of the DM/$ quota- 
tion times and find that the adjusted durations are autocorrelated. Their Table 3 
includes the estimates à — 0.07, B = 0.90, and ? = 0.91 for the WACD(1, 1) 


faily) = Lyi exp(—y;) with y; = (12.40) 


H 
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model and they reject the null hypothesis y — 1. Diagnostic tests are performed 
by checking the distribution and autocorrelations of the transformed estimated 
residuals c , with fairly satisfactory results. The WACD model is used to investi- 
gate microstructure hypotheses. One conclusion is that the bid-ask spread can be 
used to improve predictions of durations; higher spreads reduce durations, con- 
sistent with an asymmetric information model. Another hypothesis, that traders 
follow a price leader, is not supported by the data. 

Engle and Russell (1998) provide more theory and estimate similar models 
for three months of IBM transactions ending in 1991. Their parameter estimates 
include å = 0.06, p = 0.93, and y = 0.91 for the WACD(1, 1) model. These are 
similar values to the currency estimates except that the persistence estimate & + B 
is almost one. They also report results for the more general WACD(2, 2) model 
and test hypotheses about the sources of the clustering of transaction times. They 
present evidence that clustering occurs when either informed traders or liquidity 
traders are active. Engle (2000) provides further analysis of the same IBM dataset 
and covers several microstructure theories. Evidence is found to support a model 
of Easley and O'Hara (1992), interpreted as no trade means no news. Longer 
durations are associated with lower volatility, while higher bid—ask spreads and 
higher volume both predict rising volatility. 

Dufour and Engle (2000) is a more comprehensive study of equity transactions 
for eighteen firms during the same three-month period. They review the literature 
about asymmetric information models. These models imply that trades convey 
information and thus duration data may contain relevant information about prices. 
They then generalize a vector model of Hasbrouck (1991), for trades and quote 
revisions, that separates the impact of public and private information. They find 
that short inter-trade durations (and hence high trading activity) are related to 
both larger quote revisions and stronger positive autocorrelation of trades. For 
example, when a buy order is executed immediately after a previous order it is 
more likely to be followed by another buy. High trading activity is associated 
with large spreads, high volume, a high price impact of trades, and hence high 
informational content. 

Duration models can also be defined by analogy with stochastic volatility mod- 
els. Examples of SV duration models can be found in Gourieroux and Jasiak 
(2001), Bauwens and Veredas (2004), and Ghysels, Gourieroux, and Jasiak (2004). 


12.12 Extreme Price Changes 


It is difficult to estimate the probabilities of extreme price changes from low- 
frequency data because few extremes are then observed. High-frequency data pro- 
vide more data in the extreme percentiles of the empirical distributions, which can 
be used to estimate the asymptotic shape of the distribution of returns. Extremal 


12.12. Extreme Price Changes 345 


theory is covered in the books of Leadbetter, Lindgren, and Rootzén (1983) and 
Embrechts, Klüppelberg, and Mikosch (1997). 

Unbounded distributions have only two possible asymptotic shapes. Their den- 
sities either decline exponentially or they follow a power law in the tails of the 
distribution. The former possibility applies to the standard stochastic volatility 
model of Section 11.5, for which all moments are finite. The other possibility 
occurs for general ARCH models, even when the conditional distributions are 
normal. It also occurs for SV models that have heavy tails, as in Section 11.8. 

When a power law applies, the cumulative distribution function for returns r 
has the following approximate form for the right-hand tail: 


F(r) =1—ar~*, (12.41) 


for a positive power o and a positive constant a that depends on the scale parameter 
of the distribution. More precisely, 


F(r) =1—r-“G(r), (12.42) 


where G is a function that varies slowly as r increases, i.e. the limit of G(Ar)/G(r) 
asr — oco is one for all positive A. A similar definition applies to the left-hand 
tail, although the two tails can have different values of the tail index a for an 
asymmetric distribution. 

Assuming symmetry, o determines the range of finite moments. A finite moment 
of order p exists if and only if p < o. This result shows that the value of o is 
invariant to aggregation. Thus we have a very constructive role for high-frequency 
returns—if we can estimate a from them, then the same value can be used to 
characterize extreme returns for daily returns and other low frequencies. 

The estimate of o given by Hill (1975) uses the m most extreme of n obser- 
vations, which are arranged in descending order, so that r(j) > ro) > <- Z Tr) 
We may suppose the observations are returns, adjusted by subtracting the sample 
mean. For i.i.d. data, and assuming equality in (12.41), the maximum likelihood 
estimate of the tail index (in the right tail) is then @ defined by 


m-—1 
mL » log(ra)) — log(rin). (12.43) 
The left-tail estimate for return data is given by multiplying the observations by 
minus one and then arranging the data in descending order. 

There are technical issues to face when using the Hill estimate, including the 
selection of m, bias in the MLE o, and the correct calculation of the standard 
error when the data contain ARCH effects and are hence not i.i.d. Daníelsson and 
de Vries (1997) solve these technical problems and estimate the tail indices for the 
O&A database. They find & is between 3.5 and 4.5 for either tail of ten-minute 
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returns for the DM/$ and yen/$ rates. They also estimate that a 196 (or more) 
increase in the DM/$ rate during a ten-minute period will occur at an average 
frequency of once per year. 

Dacorogna et al. (2001) apply similar methods to many series. Their Table 5.3 
includes estimates of @ for thirty-minute returns (measured in 0-time) from seven 
dollar exchange rates during the decade from 1987 to 1996. The seven estimates 
range from 3.18 to 3.58, with a maximum standard error of 0.26. Therefore, 
models with infinite kurtosis appear credible for exchange rates. Their once-a- 
year event is a 1.796 (or more) increase in an exchange rate during a six-hour 
period. 


12.13 Daily High and Low Prices 


There are many databases of daily high, low, open, and close prices, particularly 
for futures markets. Although the daily high and low are only two numbers, 
they can provide much of the information that can be discovered about volatility 
from a complete intraday price record. Parkinson (1980) provided one of the first 
estimators of volatility from high and low prices, respectively h; and /;, defined 
by 
42. (log) — logi? 
d 4log(2) 


(12.44) 


Assuming intraday prices follow geometric Brownian motion (defined in the next 
chapter), this estimator is much more accurate than the squared daily return and 
it is unbiased when expected returns are zero. 

Parkinson's estimator is more accurate than the sum of five squared intraday 
returns. More accurate estimators are given by Garman and Klass (1980), who 
also use daily open and close prices to find a quadratic estimator that is as accurate 
as the sum of eight squared intraday returns. Further results appear in Beckers 
(1983), Rogers and Satchell (1991), and Yang and Zhang (2000). The assumptions 
of continuous trading, constant volatility, and a martingale for the price logarithm 
are, however, all theoretical ideals so that estimators like (12.44) will be biased. 


12.13.1 Models for the Daily Range 


Daily ranges can be modeled either by conditional distributions, using meth- 
ods similar to those described for durations in Section 12.11, or directly from 
stochastic volatility models. We focus on the latter methodology, while Chou 
(2004) develops the alternative approach. 

Suppose now that expected returns are zero and that daily returns have the 
stochastic volatility representation studied in Chapter 11: 


T; = O;Uy (12.45) 
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with the latent volatility c; independent of u, ~ N (0, 1). As in (12.44), we can 
consider modeling and predicting o; by studying the logarithm of the daily range, 
namely 

R, = log(A;) — log(l;). (12.46) 


Assuming a diffusion process for intraday prices, with constant intraday volatility, 
the daily range can also be factorized to give 


R; = Ot Ut, (12.47) 


with v; independent of o;. Equation (12.47) is also applicable when there is a 
fixed periodic pattern in intraday volatility. These patterns change the distribution 
of the times at which highs and lows occur but they do not change the distribution 
of the range. 

Taylor (1987) shows that the autocorrelations of R;, like those of |r;|, are 
proportional to the autocorrelations of o;. The constant of proportionality is much 
higher for the ranges R;, because the coefficient of variation is much less for v; 
than it is for |u;|; the former value is E[v2]/ El. = 1.09 and the latter is = 1.57 
for geometric Brownian motion. The higher autocorrelation for ranges predicted 
by theory is found in four years of exchange rate data. At lags 1, 10, and 50 they are 
0.51, 0.42, and 0.26 for ranges, compared with 0.25, 0.16, and 0.06 for absolute 
returns. The high dependence in the ranges is then used to predict volatility. 
Byers and Peel (2001) also document substantial positive autocorrelation among 
daily ranges, which leads them to recommend and estimate fractionally integrated 
processes. 

We now consider several results presented in Alizadeh et al. (2002). They 
make the important observation that the distribution of v; = R;/o; is close to 
lognormal when intraday prices follow a driftless geometric Brownian motion 
process. Equivalently, the conditional distribution of R; | o; is nearly lognormal. 
They evaluate the density of log(v;) and find that it is almost normal, with skew- 
ness and kurtosis approximately equal to 0.17 and 2.80 respectively. This implies 
that the state space representation for the logarithm of the range is almost Gaus- 
sian, for the standard SV model. The measurement equation is then 


log(R,) = log(o;) + &, (12.48) 


the residual term is & = log(v;) and the transition equation is the same as in 
Section 11.5: 
log(or) = (1 — $)o + $log(o;—1) + m. 


Parameter estimation by quasi-maximum likelihood is now much more efficient 
than when applied to the logarithms of absolute returns, for two reasons. The first 
is the approximate normality of the measurement errors in (12.48) that contrasts 
with the skewed and very leptokurtic distribution that occurs in (11.25). The 
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second is a very substantial reduction in the standard deviation of the residual 
terms, from 1.11 to 0.29. 

Alizadeh et al. (2002) also observe that bid—ask effects are almost irrelevant 
when calculating ranges but they can have a significant impact on the realized 
volatility measure discussed in Sections 12.8 and 12.9. Thus volatility predic- 
tion from ranges may be as good as from realized volatility. Their empirical 
results are for currency futures traded on five exchange rates in Chicago from 
1978 to 1998. Estimation of the parameters of the standard stochastic volatility 
model from ranges gives persistence estimates that are much lower than those 
produced by the methods described in Chapter 11. This is attributed to model 
mis-specification. Both long memory and two-factor models for volatility may 
instead be satisfactory. They prefer a two-factor transition equation, which can 
be stated as 


log(o;) = a + log(o1,;) + log(o2,;) (12.49) 
with 
log(o;;) = $ilog(oi;-1) + nit, i = 1,2. (12.50) 


Their estimates indicate that one factor is highly persistent, with ĝi = 0.97 or 
0.98, and that the other is almost uncorrelated, with 0 < VA < 0.2. The estimated 
variances of the two factors are similar. Note that the special case ¢2 = 0 defines a 
volatility model similar to the SVt model given by equations (11.42) and (11.43). 


12.14 Concluding Remarks 


High-frequency data are essential for some research and provide answers to sev- 
eral questions that cannot be answered using daily data. The very rapid response 
of prices to new information requires study of frequent prices in order to both 
understand the impact of scheduled news and major events and to see the effects 
predicted by microstructure theories. It seems probable that the major insights 
from future research into market prices will come from high-frequency analysis. 

High-frequency prices have statistical properties that are unique to intraday 
data, the most notable being periodic effects in transactions, volatility, and trading 
volume, which repeat from day to day yet vary across markets. The significant 
size of high-frequency datasets is beneficial when estimating daily volatility and 
the frequency of extreme price movements. Measures of realized volatility, in 
particular, have recently clarified many of the stochastic properties of volatility. 
The deeper understanding of volatility that is obtained can be utilized by volatility 
forecasters, derivatives traders, and portfolio and risk managers. 
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12.15 Appendix: Formulae for the Variance of 
the Realized Volatility Estimator 


The most general SV model defined in Section 12.8 has r;, j, v = 0:,j, Nur j, N / N.N 
and urs, jy ~ N(0, 1), with us, jy independent of o: ; y and all variables uv 
and o y that are determined before time t, j. The weaker assumption that the 
ur, j,y are i.d. with mean zero, variance one, and kurtosis ky is made here. 
To derive (12.29), let 
1 1 
NEC Dein toe 2 2 
YnjN fun" Era eg Word tN ep 
and note that the variables y+, j; wv have mean zero and are uncorrelated, with 


1 
-z Elo; ; w1E[u7 j. = 2u? iN +1]. 


var(yr, j, N) = N 


Thus 
py — wo? f= Dos 


has mean zero and variance 


k a i 
2d PM j| (12.51) 


Substituting ky = 3 for conditional normal distributions gives (12.29). The 
special cases oz, jy = o; and OP LN = NAj No? respectively give (12.21) and 
(12.23). 

The variance in (12.51) converges to zero as N increases if further assumptions 
are made. One set of sufficient conditions is that ky is bounded, and that 


xw aper rnt t as N — oo. (12.52) 


350 12. High-Frequency Data and Models 


Then 


Exe qn jm 2 

N- 4 2 

am Beie € mt ein 
j=l j=l 


and the upper bound converges to zero as N increases. For the special case of 
periodic volatility effects, defined by (12.22), the random variables AT y equal the 
constants A^, given in (12.24). 

The variance in (12.51) does not converge to zero when the day's volatility is 
concentrated around some parts of the day, i.e. when condition (12.52) does not 
hold. Following an example in Barndorff-Nielsen and Shephard (20022), consider 
the special situation when all price changes are jumps with 


ri, ji, N | k jumps ~ N(0, k£?). 
Then let J;, ;, v be the number of jumps in period f, j and let J; be the total number 
for day t, assumed to be finite. The latent volatility variables are then 
o? EI and df = NE? J, jN. 


Assume £? > 0 and E[J;] > 0. The variance in (12.51) now equals 


N N N 
x: PACA in| = 264 $ EL?) wl 2 264 9 EU: jw] = 2^ ELA] 
j=l j=l j=l 
and consequently the realized variance does not converge to a Instead, realized 
variance converges to the sum of the squared jumps which defines the quadratic 
variation. The difference between the squared latent volatility and the quadratic 
variation may then be small, however, when the expected number of jumps in a 
day is large. 


Part V 


Inferences from Option Prices 


13 


Continuous-Time Stochastic Processes 


Diffusion and jump processes that are defined for a continuous range of times are 
described in this chapter and used to construct a variety of processes for prices 
and their stochastic volatility. These processes are of particular importance when 
option prices are considered in later chapters. 


13.1 Introduction 


The stochastic processes that describe prices in the previous chapters only pro- 
vide probability distributions for asset prices at discrete moments in time, typically 
once every day or once every five minutes. Processes defined for a continuous 
range of times are also interesting. They are important when pricing option con- 
tracts, whose prices can help us to learn more about future asset prices, as we will 
see in the remainder of this book. 

This chapter provides an introduction to the definitions and properties of several 
continuous-time stochastic processes, which are encountered in the chapters that 
follow. A rigorous discussion of these processes requires far more mathematics 
than is deployed here. More theory can be found in the texts by Baxter and Rennie 
(1996), Cont and Tankov (2003), Etheridge (2002), and Mikosch (1998). 

We consider processes of increasing complexity, concluding in Section 13.6 
with bivariate processes that provide a fairly realistic description of prices and 
their stochastic volatility. We commence with diffusion processes that contain no 
jumps as time progresses. The Wiener process, also called Brownian motion, is 
described in Section 13.2. Itis used to construct univariate and bivariate diffusions, 
respectively in Sections 13.3 and 13.4. Processes that only contain jumps are 
defined in Section 13.5. Mixed processes that incorporate both diffusion and 
jump components conclude the chapter. 

The continuous-time processes used in finance are more often motivated by 
theoretical convenience than by empirical analysis. Consequently, it is common to 
ignore stylized facts such as the periodic intraday variation in volatility described 
in Section 12.5. To simplify our descriptions, we do the same. Volatility processes 
that are more realistic can be created by using a deterministic multiplier that 
represents the periodic effect. 
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Several notational changes are introduced in this chapter. The time variable t 
previously counted trading periods. The symbol t is now used differently, to refer 
to a time on a continuous scale. The units of t are not important, although the 
conventional units are years for finance applications. A typical random variable 
defined for a continuous timescale is represented by a capital letter with the time 
shown in brackets, e.g. X (t), while time subscripts are reserved for a discrete 
timescale. Prices are now denoted by S(t), as the letter P will be used to refer 
either to a particular probability or to a probability measure. 


13.2 The Wiener Process 
13.2.1 Properties 


A Wiener or standard Brownian motion process is initially denoted by {W (t)} 
and it consists of a random variable W (t) for all times t that are nonnegative real 
numbers. A sample path from the process until time T is any realization of all the 
random variables for 0 < t < T. 

A Wiener process has four defining properties: 


e W(0) =0; 

* W(t) — W(s) ~ N(0,t — s) whenever t > s; 

e W(v) — W (u) is independent of W(t) — W (s) whenever v > u >t > s; 
e W(t) is a continuous process—there are no jumps in its sample paths. 


The independence of the increments W(v) — W(u) and W(t) — W(s) isa 
random walk property. For any time step A, the discrete-time process defined 
by y, = W(nA), for integers n, follows a random walk. Figure 13.1 shows a 
realization of {yn} when A = 10? and 0 € n < 10? obtained by simulating 
the random walk y, = Yn-1 + A/ Az, With zn ^ i.i.d. N(O, 1). This realization 
is merely an approximation to a sample path from a Wiener process, because it 
only defines sample values for a finite number of times. 

A sample path from a Wiener process is not a differentiable function of time, 
even though it is continuous. This is only one of many properties that may be 
surprising when they are first encountered. The existence of a process satisfying 
the defining properties is outlined in the appendix to this chapter for the benefit 
of any skeptical readers. 


13.2.2 Remarks about Stochastic Calculus 


The notation dW often appears in equations. Although writers employ it in many 
ways, the only rigorous usage is within stochastic integrals. A detailed discussion 
of these integrals is outside the scope of this book. A simple example is 


T 
f dW (t) = W(T). 
0 
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Figure 13.1. One approximation to a sample path from a Wiener process. 


A less obvious result is 
T 
f W(t)dW(t) = 1QW(TY - T), (13.1) 
0 
which is obtained by finding the limit of 


EG Ziel will 


=0 


as n increases (Baxter and Rennie 1996, Section 3.3). This result emphasizes that 
stochastic integration is different to ordinary integration, since the latter gives 
fydy= T? /2 when the integral’s limits are 0 and T. 

From (13.1) we can deduce that d(W (1)?) zz 2W (t) dW (t). A more important 
example for us of stochastic differentiation differing from ordinary differentiation 
occurs in the next section when we encounter a consequence of the following 
inequality: 

d(log W(t)) zz dW (t)/ W (t). 


13.3 Diffusion Processes 


13.3.1 General Processes 


A general time-invariant diffusion process, also called an Itô process, is denoted by 
(X (t)} in this section. It is constructed from a drift function a(X (t)), a volatility 
function b(X (t)), and a Wiener process. The general diffusion process can be 
written as a stochastic differential equation (SDE), 


dX (t) = a(X (1r) dt + b(X (0) aW (t), 
or more compactly as 


dX = a(X) dt + b(X) dW. (13.2) 
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The conditional distribution of X (t + A) given X(t) can then be approximated 
for small increments A by assuming the drift and volatility are constant, so that 


X(t + A) — X(t) | Ki ^ N(a(X()) A^, b(X (0)? A) approximately. (13.3) 
An equivalent representation of an SDE employs stochastic integrals, 
T T 
X(T) — X(0) = d a(X (t)) dt + f b(X(t)) dW), 
0 0 


that are defined by limits of Riemann sums. A limit for the second integral is 


BETEN 


13.3.2 Arithmetic Brownian Motion 


This process is outlined in the famous thesis by Bachelier (1900) on a “Theory of 
speculation.” The drift and volatility functions are simply constants, respectively 
u and o. The SDE is therefore 


dX = udt --o dW 
and 
X(t) — X(0) = ut tel, 
Then X(t) — X(s) ~ N(u(t — s), o?(t — s)), whenever t > s. This process 


cannot be recommended for asset prices because X (1) has a positive probability 
of a negative outcome, for any initial value X (0). 


13.3.3 Geometric Brownian Motion 


Replacing X by its logarithm in the above equations will ensure that the process 
always has positive outcomes. We change the drift rate for log(X) to u — o?/2 
and define (X (t)} to be geometric Brownian motion (GBM) when 


log X(t) — log X(0) = (u — 40?)t - oW). (13.4) 


The distribution of X (t) given X (s) is then lognormal and its conditional expec- 
tation equals X (s) exp(u(t — s)). GBM is often used as a simple description of 
asset price dynamics, for example, in the derivation of the Black-Scholes option 
pricing formula. 

The corresponding SDE is 


d(log X) = (u — 1o?) dt + o aW. 
The famous lemma of Itó (1951), explained in the textbooks listed in Section 13.1, 
produces the result 
d(log X) = dX/X — 1o? dt. 
An equivalent SDE for GBM is therefore 
dX/X = udt +0 dW. (13.5) 


13.3. Diffusion Processes 357 


13.3.4 The CEV Process 


The volatility function of dX/ X equals o for GBM. The constant elasticity of 
variance (CEV) process of Cox and Ross (1976) permits this function to vary 
inversely with the level of asset prices; thus, 


dX/X = udt-oX dW, Ox <1. 


A deterministic relationship between the asset price and its volatility is very restric- 
tive. The continuous-time price and volatility processes described in Sections 13.4 
and 13.6 are more realistic. 


13.3.5 The OU Process 


All the previous examples describe nonstationary processes that have a random 
walk property. Stationary processes require the drift function a(X) to be positive 
when X is below its mean level and to be negative otherwise. The simplest example 
is the Ornstein-Uhlenbeck (OU) process, whose SDE is 


dX = «(a — X)dt + o dW. (13.6) 


The positive parameter « determines the rate at which this process is pulled back 
towards the mean parameter o. The OU process has been used to model the 
logarithm of volatility (Scott 1987; Wiggins 1987), because the OU process is a 
continuous-time extension of the AR(1) process. Figure 13.2 shows a sample path 
for an OU process that reverts towards a mean level of 100; the other parameters 
are explained in the paragraph after equation (13.8). 

The distribution of X at time ft, conditional on its value at an earlier time s, is 
normal. The conditional mean is given by 


E[X (t) | X(s)) =a +e*"-) (X (s) — a). (13.7) 


These conditional expectations converge to o as f increases; they are mid-way 
between X (s) and o when t — s equals the half-life parameter defined by ® = 
log(2)/x. The conditional variance is independent of X (s) and equals 


2 
var(X (t) | X(s)) — a M 


The limit of the conditional distributions as t — oo is N(a,@”) with œ? = 
o? /(2k). When X(0) has this distribution, firstly it is also the unconditional 
distribution of each random variable X (t) and secondly the correlation between 
X (s) and X (t) equals exp(—x|t — s|). 

The OU process is the continuous-time equivalent of the AR(1) process. For a 
selected time increment A, let $ = exp(—x A). Then 


X(s + A)| XG) ^ N(a 4 O(X(s) — o), (1 — Q2)o?). (13.8) 
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Figure 13.2. Sample paths for three mean-reverting diffusion processes. 


Consequently, the discrete-time process yn = X(nA) is a Gaussian, AR(1) 


process with autoregressive parameter $, mean a, and variance o? 


yo ~ N (a, o»). 


, assuming 


13.3.6 Simulated Paths 


The three curves in Figure 13.2 show simulated paths for three mean-reverting 
diffusion processes that might be considered as models for the annualized variance 
of prices. All these processes are stationary with mean equal to 100, corresponding 
to an annualized volatility of 10%. Each process has standard deviation equal 
to 80 and a half-life of three months, so that d = 0.25 and x = 2.77. They 
all have conditional expectations defined by equation (13.7). The path for the 
OU process is shown by the dark, solid curve. It is obtained by simulating the 
distributions specified by equation (13.8), with A = 0.005 and $ = 0.986. The 
same realizations of a sequence of standard normal variables are used to construct 
all three sample paths. 


13.3.7 The Square-Root Process 


The OU process can attain negative values, so it is inappropriate for modeling 
positive variables. Reducing the volatility function b(X) as X approaches zero can 
ensure that X remains positive. If we retain the drift function of the OU process 
and change the volatility function to £4/X, we obtain the square-root process of 
Cox, Ingersoll, and Ross (1985): 


dX = x(a — X)dt - EV X dW. (13.9) 
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The realizations of the random variables X (t) are always positive when the process 
commences at a positive value. The constraint 2ka > & is required to avoid 
sample paths that converge to zero. 

The conditional distribution X (t) | X (s) is now a noncentral chi-squared dis- 
tribution. The conditional mean is again given by equation (13.7). The conditional 
variance is a linear function of X (s): 


var(X (t) | X(s)) = Qx) !£?(1 — e **^9)[o + e *€79 (2 X (s) — o]. 


Ast — oo, the mean and variance respectively converge tow and Y? = o£? / (2x). 
The limit of the conditional distributions is a gamma distribution, whose density 
is 
8Y 
r(y) 


with y = 2xo/£? and 8 = 2x /£?° (Cox et al. 1985). 
The dotted curve in Figure 13.2 shows a simulated path from the square-root 
process, for the parameter values œ = 100, k = 2.77, and £ = 18.8. 


fine ge torts. z>0, (13.10) 


13.4 Bivariate Diffusion Processes 
13.4.1 The Bivariate Wiener Process 


Suppose that {W (1)) is a Wiener process that is independent of a second Wiener 
process (Y (t)}. For any value of a correlation parameter p between — 1 and 1, let 
us define a third Wiener process by 


Z(t) = pW(t)+ y1— p?Y(t). (13.11) 
The stochastic process whose variables are the column vector 
B(t) = (W(t), ZO 


defines the general bivariate Wiener process. The parameter o then equals the 
correlation between the increments of the component processes: 


cor(W(t) — W(s), Z(t) — Z(s)) = p. 
The correlation parameter is often stated within a more compact equation: 
dW dZ = pdt. 


13.4.2 Examples 


The stochastic volatility (SV) models of Chapter 11 specify a volatility equation 
and then employ the volatility variable in an equation that defines returns. An SV 
model is an approximation to a bivariate diffusion process for prices S(t) and 
their stochastic variance V (t), defined by the pair of equations: 


dS/S = udt LV dW (13.12) 
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and 
dV — a(V)dt Lt b(V)dZ. (13.13) 


The price equation above combined with an OU process for (log V(t)} can be 
approximated by the asymmetric SV model of Section 11.9, which simplifies to 
the standard SV model of Section 11.5 when p = 0. The alternative specification 
of a square-root process for (V (t)} has the advantage that closed-form option 
prices can then be calculated for any value of p, as we will see in Section 14.6. 
A negative correlation o is appropriate for equity indices, as volatility tends to 
increase when index levels fall. 


13.4.3 Limits of ARCH Processes 


ARCH models are also approximations to bivariate diffusion processes. To illus- 
trate this, we follow Nelson (1990b) and consider a sequence of GARCH(1, 1)-M 
models. The symbols u, A, œ, a, B now represent ARCH model parameters. 

Let ó be a specific time increment measured in years, which might represent 
one day, and let discrete-time prices (S; 5, t = 0, 5,26, ...} and their conditional 
variances {h;,5} be represented by 


log(S;,5) — log(S; 5,5) = wd + àh 8 + y hi s zr 


and 
his =w + hi—5,8 (022 5 5 + B) 


with z;,5 ~ i.i.d. N (0, 1). For a general time increment A < ô, we construct a 
related model for a process (5S; A) and its conditional variances (/;, A), defined by 


log(S;,a) — log(S;-a,a) = HA d AB, A Pit 21,4 (13.14) 

and 
ht, a — wA + hah KORE A,A + Ba) (13.15) 

with 


wa = OLAI, aa =ar/A/6,  Ba=1- wa —((l—a — B)A/S) 


and z; A ~ i.i.d. N (0, 1). The annualized conditional variance, 


Via = hi A/A, 
can then be shown to change as follows: 


o  (l—o-—B)Vi-A,^ A 
82 A 


UO | 


1/2 
+ «($) V-a AEG A)? (2 4 4 — DI. 
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The quantity inside the second square bracket has mean and variance respectively 
equal to zero and A. 

A continuous-time model for time increment A can be defined by supposing 
that prices and conditional variances only change at times that are multiples of A. 
Nelson (1990b) proves that these continuous-time models converge as A — 0, 
in distribution, to the bivariate diffusion 


dlog S) = (u + àV) dt + VV dW (13.16) 


Ifo 2\ 12 
av i| -a-e- pv area(5) V dZ. (13.17) 


The Wiener processes W and Z are independent, because there is no correlation 
between the innovation terms z;,4 and a ^ — 1 (Gourieroux and Jasiak 2001, 
p. 258). 

The variance process V given by the limit of GARCH(1, 1) models has the same 
linear drift function as the OU and the square-root processes, but the term that 
multiplies the differential dZ is now proportional to V. The light, solid curve in 
Figure 13.2 shows a sample path for the process X — 10^V when the parameters 
of V are chosen so that the mean, standard deviation, and half-life of X are 
respectively 100, 80, and 0.25. With 6 = 1/250, the ARCH parameters are oi = 
4.43 x 1077, æ = 0.066, and 8 = 0.923. 

Likewise, it can be shown that a sequence of EGARCH(1) models, based upon 
equation (10.1), converges to a bivariate diffusion with the logarithm of the vari- 
ance process V following an OU process (Nelson 1990b; Bollerslev et al. 1994). 
We note that the same diffusion limit arises for stochastic volatility models, as 
mentioned after equation (13.13). For the volatility residual function described 
by equation (10.2), say 


ga(Za) = Baza + VA(Iza| — dief 


for time increment A, the correlation between the limiting Wiener processes W 
and Z is equal to 


p — cor(za, gA(zA)) = 9A/[92, + yi — 2/1)]!?, (13.18) 


assuming that the ratio 9A/yA is a constant. Consequently, correlated Wiener 
processes occur in diffusion limits when the ARCH volatility residual function 
is asymmetric. Diffusion limits for some other ARCH processes are provided in 
Duan (1997). 


13.5 Jump Processes 


The sample paths of diffusion processes are continuous functions of time. In 
contrast, the sample paths of jump processes only change at discrete jump times. 
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13.5.1 Finite Activity Processes 


The Poisson process is the simplest example of a jump process. The random 
variable N, (t) counts the number of jump times between times 0 and t inclusive, 
for some finite intensity rate A. The process has the following properties. 


* The expected number of jumps in any time interval of length A is equal to 
AA. 


* N (t) — N;(s) has a Poisson distribution, with mean A(t — s) for all t > 
s>0. 


e N, (v) - Ni (u) is independent of all random variables obtained from { N; (t), 
O <t € u} forall v > u. In particular, it is independent of N} (t) — N, (s) 
whenever v > u 2 t » s. 


For any sample path from time zero until any time t, the duration of time until the 
next jump has an exponential distribution, with mean equal to 1/4 and density 
function 

fare, z20. (13.19) 


The compound Poisson process, here denoted by ( X (t)}, commences at X (0) = 
O and has a general distribution for the jump sizes. The jump size is an i.1.d. variable 
Jn When jump n occurs for a Poisson process IN: (t)}. Let J (t) be the appropriate 
random jump j, if there is a jump at time ¢ and otherwise let J (t) be zero. Then 


X()-2 A JG) 
O<s<t 
and the SDE for X is given by 
dX(t) = J(t) dN, (t). (13.20) 


It is assumed that the jump size process {jn} is independent of { N; (t)}. 
A specific example of a jump process for the logarithm of asset prices is given 
by 
d(log S(t)) = udt + J (t) d4N (t), (13.21) 


with 
J(t) ~ N(uj, 07) when t is a jump time. 
The SDE in (13.21) is identical to 
dS/S = udt + (e7 — 1) dM, (13.22) 


as the price is multiplied by exp(J (1)) at a jump time. Also E[S(t) | S(s)] equals 
S(s) exp(&(t — s)) with € = u + ACE[exp(J4)] — 1). The corresponding discrete- 
time process for prices is similar to the compound events model of Press (1967) 
and it is also a special case of the information arrivals model of Section 8.3. 
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Compound Poisson processes are finite activity processes, as they have a finite 
number of jumps within any finite time interval. Suppose now that the jump sizes 
ja are continuous random variables with density function f(y). Then g(y) — 
Af Cy) is called the Lévy measure of the process by some writers and the Lévy 
density by others; the arrival (or intensity) rate of jumps that have sizes between 
y — 6/2 and y + 6/2 is approximately equal to óg(y), when ô is small. 


13.5.2 Infinite Activity Processes 


Itis possible to define jump processes that have an infinite number of jumps within 
any finite time interval. The Lévy measure g(y) can still be defined for jump sizes 
that have continuous distributions by the property that the arrival rate of jumps of 
sizes between a and b is given by 
b 
f g(y) dy, 
a 


when zero is outside the range from a to b; the arrival rate is infinite whenever 
a < 0 < b, so the process has infinitely many “small” jumps. 

Several examples of Lévy measures are provided in Table 1 of Carr and Wu 
(2004). One example is the four-parameter measure of Carr, Geman, Madan, and 
Yor (2002), defined by 


a(y) = CE UNO yen 
Ce8»(—y)-(*D, y <0, 


with C > 0, G > 0, M 20,and Y < 2. The CGMY jump process has infinite 
activity when Y > 0 and finite activity when Y < 0. 


13.6 Jump-Diffusion Processes 
13.6.1 Lévy Processes 


Arithmetic Brownian motion and the compound Poisson process are examples of 
continuous-time random walks. Random walks have stationary and independent 
increments. Any continuous-time random walk {X (1)) is called a Lévy process; it 
commences at X (0) = 0, the increment X (t) — X (s) is independent of (X (r), 0 < 
r < s), and the distribution of X (t) — X (s) is identical to that of X (t — s), with 
O < s < t. The most general Lévy process can be expressed as the sum of an 
arithmetic Brownian motion process and an independent jump process; see Cont 
and Tankov (2003) for further discussion of this result and many others for Lévy 
processes. 

Recently, several interesting jump-diffusion processes for asset prices have 
been constructed from Lévy processes. Typically, at least two Lévy processes are 
used, to ensure that price models incorporate stochastic volatility effects. These 
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Figure 13.3. Sample path from a jump-diffusion process. 


constructions are interesting for two reasons. First, they offer new insights into 
the dynamics of observed prices (Barndorff-Nielsen and Shephard 2001, 2005a,b; 
Eraker, Johannes, and Polson 2003). Second, they provide flexible and convenient 
mathematical structures when option contracts are priced. These structures involve 
a time-change of a Lévy process, from calendar time to a new timescale that 
represents cumulative economic activity (Carr, Geman, Madan, and Yor 2003; 
Carr and Wu 2004; Huang and Wu 2004). 


13.6.2 Examples 


Sample paths that are continuous, except for jumps that occur at a finite rate, are 
obtained from the sum of a diffusion process and a finite-activity jump process. 
Anexample for the logarithm of prices, proposed in Merton (1976), is given by the 
sum of an arithmetic Brownian motion process and a compound Poisson process, 


d(log 5) = (u — 1o?) dt +o dW + J dM, (13.23) 


for a Wiener process {W (t)} that is independent of a compound Poisson pro- 
cess {J (t), N} (1)). Figure 13.3 shows a sample path for {S(t)} when u = 0.21, 
o = 0.1, à = 3 and the jumps are normally distributed with mean and standard 
deviation equal to —0.05 and 0.03. There are four jumps on this path, three of 
which are clearly visible near times 0.36, 0.47, and 0.61. 

A realistic continuous-time model must also incorporate a stochastic process 
for the volatility of prices. Motivated by the existence of closed-form theoretical 
option prices, Bates (1996) and Scott (1997) add price jumps to a bivariate diffu- 
sion process for prices S(t) and their stochastic variance V (t), with the variance 
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defined by a square-root process: 
d(log S) = (u — 1V) dt + V V AW + J dN;, (13.24) 
dV = k (œ — V) dt -- £V V dZ, 
and J (t) ^ N(wy, a?) when t is a jump time. 


Now {W (t), Z(t)) is a bivariate Wiener process that is independent of the jump 
process. This bivariate process for the price and its variance has parameters u, K, 
æ, E, à, wy, 0j, and p = cor(W (t), Z(t)). 

When model structures based upon (13.24) are estimated from US stock index 
data, as in Bakshi, Cao, and Chen (1997), Bates (2000), and Pan (2002), evidence 
of mis-specification is found, according to Eraker et al. (2003). This problem can 
be attributed to the diffusion specification for volatility, which does not permit the 
rapid increases in volatility that are often estimated from ARCH models. Similar 
models are estimated by Andersen, Benzoni, and Lund (2002), who suppose the 
jump intensity rate A is a linear function of the variance V. A specification that 
adds a second volatility factor to (13.24) is estimated from the realized volatility 
of DM/$ rates by Bollerslev and Zhou (2002). 

Duffie, Pan, and Singleton (2000) propose a more general model for prices and 
their variances that includes finite-activity jumps in both prices and volatility. The 
volatility jumps are always positive. Three types of jumps are proposed: jumps 
in log( S), jumps in V, and simultaneous correlated jumps in both log(S) and V. 
Special cases of an almost identical model are estimated in the interesting paper 
by Eraker et al. (2003). These special cases are all defined by 


d(log S) = udt + VV dW + JS aN? (13.25) 


and 
dV =x(@—V)dt+EVVdZ+ JY dN. 


The volatility jumps arrive at a rate ņ and their sizes J (t) have an exponential 
distribution. The SVIJ model has independent price jumps that arrive at a rate À 
and the sizes JS (t) once more have a normal distribution. The SVCJ model, in 
contrast, has contemporaneous price and volatility jumps, so that N d (t) 2 N i (t) 
for all times t. The distribution of the price jump then has a conditional mean that 
is a linear function of the volatility jump. 

Eraker et al. (2003) apply MCMC estimation methodology to daily returns 
from the S&P 500 index, from 1980 to 1999. This methodology provides posterior 
distributions for all the parameters and also for the outcomes from all the jump 
variables. For the SVCJ model, the estimate of the jump frequency A is 1.5 jumps 
per annum. Most of the price jumps are negative and their average level reduces 
the index by 3%. An average volatility jump, when volatility is near its median 
level, lifts the annualized volatility from 15% to 24%. The posterior probability of 
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jumps on a few days exceeds 0.9. It is almost one on the crash day of 19 October 
1987, when it is estimated that the price jump reduces the index by 1446 and the 
contemporaneous volatility jump lifts volatility from 4096 to 5096. 

Chernov, Gallant, Ghysels, and Tauchen (2003) estimate several jump-diffusion 
models, using daily returns on the DJIA index from 1953 to 1999. They observe 
that abrupt changes in volatility are an essential ingredient of a successful model. 

Barndorff-Nielsen and Shephard (2001, 2002b) develop a theoretical frame- 
work that allows the volatility jump sizes, JY (t), to have a general positive distri- 
bution instead of the exponential distribution of Duffie et al. (2000). It is assumed 
that all increases in volatility occur at the jump times, with volatility decaying 
exponentially between the jumps. The price jumps are assumed to be perfectly 
correlated with the volatility jumps. 

Barndorff-Nielsen and Shephard (2001) recommend a general volatility pro- 
cess that defines (V (1)) to be a weighted sum of several independent processes 
{V;(t)} that have differing drift rates, jump intensity rates, and jump distributions. 
They provide an impressive empirical analysis of one year of five-minute returns 
from the DM/$ exchange rate. Their estimates for a four-component volatility 
model identify a dominant short-term component and three persistent compo- 
nents that have long half-lives. This model provides a satisfactory explanation of 
the autocorrelations of squared five-minute returns, for time lags varying from 
five minutes to 100 days. 


Further Reading 


Eraker, B., M. Johannes, and N. G. Polson. 2003. The impact of jumps in volatility and 
returns. Journal of Finance 58:1269—1300. 


13.7 Appendix: a Construction of the Wiener Process 


The existence of a Wiener process can be demonstrated in many ways. We do 
this by considering the limit of a sequence of interdependent step processes, all 
defined for 0 < t < 1. 

The n-step process, denoted by {W,,(t)}, consists of a random variable W, (t) 
for all times in the unit interval, with steps at the times t = 1/n,2/n,..., 1. The 
process starts at zero. At each step, the random variable changes by an independent 
quantity Zn, j, that has a normal distribution with mean zero and variance 1/n. 
Between the steps, the random variables are identical. Thus, 


Wa, (0) = 0, 


J JE 1 ' 
Wil — | = Wal —— | + nj 2Znj ~idd. N(O,-], 1<j<n, 
n n n 


is 
wo = w (2) When sb pur ho See E 
n n n 
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Figure 13.4. One path from a 16-step process. 


Figure 13.5. One path from a 256-step process. 


Figure 13.4 shows one sample path from the 16-step process that contains 16 
jumps, obtained by replacing the random steps by sample values taken from 
N (0, 1/16). 

It is constructive to focus on the sequence of processes defined by n = 2” for 
positive integers m. We define these processes to be interdependent, by requiring 
W, and W2, to be identical at the step times of W,,: 


W(t) = Wn (t) fort =1/n,2/n,..., 1. 
The necessary collection of constraints is 
Znj = Zan,2j-1 + Za2j Leien n=2", me. 


The conditional distribution of Z2n,2j—1, given Zn,;, is then normal with mean 
2n,j/2 and variance 1/(4n). Figures 13.4 and 13.5 show sample paths for the 
interdependent processes Wig and W256. It may be agreed that these paths will 
converge to some limit if we continue to increase m. 

The Wiener process {W (t), 0 € t < 1} is here defined as the limit of the step 
processes (W,(t)), n = 2", as m — oo. Although there are several ways to 
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define convergence for random variables, the precise method is not important in 
this context. All the multivariate probability functions converge, so 


PO, (ti) € xi, ..., Matt) S xy) > PQV (ti) € x1, ..., W(t) € xj) 


as m — oo, for all times t1, ..., tg and all possible outcomes x1, ... , xy. Simi- 
larly, the n-step sample paths converge to a sample path from the Wiener process. 

Once we have a definition of {W (t), 0 < t € 1} we can define interdependent 
step processes {W,,(t)} for all positive integers n by simply requiring W, (t) = 
W(t) for t = 1/n,2/n, ..., 1. 


14 


Option Pricing Formulae 


This chapter reviews the determination of rational option prices for a variety 
of stochastic processes for the underlying asset price. These option prices often 
depend on a volatility risk premium. The chapter also covers the inverse problem of 
using observed asset and option prices to obtain implied levels of future volatility. 


14.1 Introduction 


Option prices are a source of valuable information about the distributions of 
future asset prices. This motivates our interest in option pricing formulae, which 
are needed to extract and interpret predictive information from the market prices 
of options. This chapter includes several pricing formulae for European options, 
all of which can be obtained by a risk-neutral valuation methodology. Rational 
option prices were first derived by Black and Scholes (1973) and Merton (1973), 
who assumed asset prices follow a geometric Brownian motion (GBM) process. 
An option price reveals the future level of volatility when all their assumptions 
are correct. 

Option pricing formulae that are consistent with the empirical stylized facts 
for asset prices were subsequently derived. An essential property that needs to be 
incorporated into the option pricing framework is stochastic volatility. This was 
accomplished by Hull and White (1987), among others. Stochastic volatility solu- 
tions incorporate a volatility risk premium parameter. This is necessary because 
an option cannot be replicated by a dynamic hedge portfolio, constructed from 
the underlying asset and a risk-free security, when volatility is stochastic. 

Section 14.2 reviews standard definitions and notation for option contracts. 
Section 14.3 presents the famous Black-Scholes formulae and explains how they 
can be derived from risk-neutral price dynamics. Information about the organi- 
zation of options trading and a comprehensive analysis of Black-Scholes pricing 
can be found in several textbooks, including Hull (2000) and Kolb (1999). 

The levels of volatility implied by a Black-Scholes formula and empirical 
option prices are defined in Section 14.4 and their stylized facts are also dis- 
cussed. Section 14.5 covers pricing formulae when volatility follows a continuous- 
time stochastic process. It also covers some implications for the interpretation of 
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Black-Scholes implied volatilities. The pricing formulae can be evaluated rapidly 
for special asset price dynamics using the numerical methods introduced in Hes- 
ton (1993), which are explained in Section 14.6. Formulae are also available 
for discrete-time price dynamics. Section 14.7 provides the equations and some 
illustrative results for ARCH models. 


14.2 Definitions, Notation, and Assumptions 
14.2.1 Terminology 


An option is a derivative security that is assumed to be traded. Someone who buys 
an option has the opportunity to make a transaction of an underlying asset at a 
later date. The counterparty to any later transaction will be an option seller. The 
owner of a call option has the opportunity to buy the underlying asset, while the 
owner of a put option has the opportunity to sell. The option contract will specify 
an exercise (or strike) price and one or more times at which the opportunity can 
be exercised. 

For example, on 15 February someone might pay $5 a share to buy a call option 
on a stock with exercise price $50 and only one permitted exercise time, say 15 
March. This person will choose to exercise the option on 15 March if the share 
price is then above $50, as this allows a purchase below the current market price. 
If, however, the share price is below $50 on 15 March, then the owner of the call 
option will have a worthless opportunity which will not be exercised. 

The option is said to be European in the above example because there is only 
one permitted exercise time. In contrast, an American option allows the owner 
to exercise at any time until a final date specified in the contract. Both European 
and American options are traded at major markets and sometimes both types are 
traded on the same underlying asset. 


14.2.2 Prices and Times 


Time is measured in years when we discuss options and we adopt the convention 
that the present time is zero. An option contract then includes the exercise price X 
and the time T at which the option expires, this time marking the final opportunity 
to exercise the option. During the interval from time 0 until the expiry time T the 
price of the underlying asset will fluctuate and we denote the price at time t by 
S;. To simplify notation, the time subscript is often discarded when t = 0, thus 
S = So. During the same time interval a call price will commence at c = co, 
have price c; at time f, and conclude at cr. Likewise a put price will move from 
P = poto pr. 

In the remainder of this book we concentrate on two problems. First, we 
are interested in methods for determining fair market prices c and p for given 
inputs that include S, T, X, and the dynamics of the continuous-time stochastic 
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process {S;}. Second, we are also interested in the inverse problem of using one 
or more observed option prices to infer one or more properties of the asset price 
process {S+}. 


14.2.3 Interest Rates and Dividends 


Option prices depend on risk-free interest rates. We will generally assume that 
these are constant, which is a pragmatic assumption for options on stocks, stock 
indices, currencies, and commodities. It is at best a dubious assumption when 
valuing options on debt securities, but these are beyond the scope of this book. 
The continuously compounded, constant, risk-free interest rate is denoted by r, so 
that $1 deposited now is worth Sei! at time T. It is also assumed that borrowing 
and lending are possible at the same rate. 

Option prices also depend on dividends paid to the owners of the underlying 
asset. We will assume that dividends are paid at a constant rate, denoted by q. A 
foreign exchange deposit will earn the constant risk-free foreign rate of interest, 
which then defines q. Thus, if sterling is the foreign currency, £1 will accumulate 
to £e47 at time T. The dividend assumption is trivially applicable to a stock that 
pays no dividends up to time T and then q = O. It is a reasonable assumption 
for a stock index that contains many stocks that pay dividends at different times. 
The assumption then implies that one unit of the stock (index) becomes e47 units 
of the stock (index) at time 7, after reinvesting all the dividends at zero cost 
in the stock (index). For options on commodities, q can be identified with the 
convenience yield (if any) minus storage costs. 

Options can also be written on futures contracts. A long position in the underly- 
ing asset can then be acquired by buying futures, without investing capital which 
can instead be invested at the rate r. Thus we set q = r when we value options 
on futures. Hull (2000) provides more discussion of the parameter q. He also 
describes the valuation of options when the dividend assumption is false. 


14.2.4 Forward Prices 


Itis well known that the forward foreign exchange rate F, at time 0 for exchange 
at time 7, is given by the following function of the spot rate S, the domestic 
interest rate r, and the foreign interest rate q: 


F = Self Of. (14.1) 


Arbitrage profits can be made if F is not obtained from this equation, assuming 
transaction costs can be ignored. The same conclusion holds for any other asset for 
which forward contracts exist and our dividend assumption applies (Hull 2000, 
Appendix 3A). We will define a forward price F by (14.1) even when forward 
contracts do not exist. Any futures price is also given by (14.1), because we assume 
constant interest rates (Cox, Ingersoll, and Ross 1981). 
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14.2.5 Put—Call Parity 


A second no-arbitrage equation is given by the put-call parity equation for Euro- 
pean options. This shows the difference between the prices of calls and puts that 
have the same exercise price and time to expiry: 


c— p = e — Xe! = (F = Xe, (14.2) 


This equation must apply (when there are no transaction costs) to prevent arbi- 
trageurs making money without using any capital (Hull 2000, p. 275). Asc 2 0 
and p > 0, (14.2) gives lower bounds for c and p. We deduce that 


c > max(Se 4! — Xe^'*,0) and p > max(Xe 7 — Se 47,0) (14.3) 


for European options, whatever the price dynamics for S;. The function “max” 
selects the maximum of the terms inside the brackets, in (14.3) and in similar 
subsequent expressions. The upper bounds for European options can also be 
deduced from no-arbitrage arguments, and they are 


c< Se and p< Xe". (14.4) 

14.2.6 Boundary Conditions 
At expiry, cr must be Sr — X if Sr > X, and it must be zero if Sr < X. Thus 

cr = max(Sr — X, 0). (14.5) 
Likewise, 

pr = max(X — S7,0). (14.6) 
American options have the further boundary conditions c; > max(S; — X, 0) and 
p: 2 max(X — 8,0) forO x t « T. 
14.3 Black-Scholes and Related Formulae 


14.3.1 Price Dynamics 


The famous formulae of Black and Scholes (1973) for the fair prices of European 
options follow from several assumptions, which are also discussed in Merton 
(1973). Asset prices {S,} are assumed to follow a GBM, defined in Section 13.3 
and represented by the equation 


dS = u$ dt +o SdW. (14.7) 


Here u and o are constants that represent the expected return and volatility per 
unit time. All the random variation in the asset prices comes from the Wiener 
process (W,]. 
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14.3.2 Assumptions 


The formulae are derived from the assumed price dynamics and several further 
assumptions. These include constant interest rates and dividend yields, short sell- 
ing opportunities, no transaction costs, no taxes, and continuous trading of the 
asset and the option. The key insight that leads to the formulae is the assumption 
that no one can make arbitrage profits by owning a portfolio that contains variable 
quantities of the asset and the option. The impossibility of arbitrage profits from 
dynamic portfolios leads to a partial differential equation (PDE) that is satisfied 
by the price of any European derivative security. A specific derivative defines 
particular boundary conditions and hence a particular solution to the fundamental 
PDE. 

We do not state or solve the PDE here, because we prefer to emphasize the 
derivation of option prices from risk-neutral price dynamics. This is explained 
after presenting and illustrating the formulae for call and put options. 


14.3.3 The General Black-Scholes Formulae 


The fair price of a European call option is given by a function of six parameters: 
the asset price S, the time until expiry T, the exercise price X, the risk-free 
interest rate r, the dividend yield q, and the volatility o. The original formulae in 
Black and Scholes (1973) assume there are no dividends and hence omit q; these 
formulae are given by replacing q by zero in the equations that follow. 

The option price does not depend on u. This term does not appear in the PDE, 
because the return to a riskless portfolio constructed from the option and the 
underlying asset only depends on r. 

The formula includes cumulative probabilities for the standard normal distri- 
bution. Let N (d) be the probability that a standard normal variate is less than or 
equal to d. Then the general Black-Scholes call formula is 


cgs (S, T, X, r, q, o) = Se 1! N(di) — Xe"! N (d2) 


with 


log(S/ X) + (r — q + 1o?)T 
dj — 
cT 


The owner of one call option can create a riskless portfolio by short selling 
Ocps/0S units of the underlying asset. This hedge ratio is given by 


and d =d,—oVT. (14.8) 


Ocps -qT 
——- = N (d 14.9 
js m (di) (14.9) 


at time zero. 
The put-call parity equation, (14.2), applies to formula prices. Thus, 


PBs(S, T, X, r,d, o) = cgs($, T, X, r,d,o) —Se4" + Xe". (14.10) 
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Black-Scholes calculations 


Inputs 
S, time 0 sigma*(T^0.5) | 0.0750 
i : log(S/X) -0.0328 
X -0.2997 
r -0.3747 
q ; 0.3822 
sigma ; 0.3539 


S*exp(-qT) 5955.17 
Option prices X*exp(-rT) 6107.69 
CG 
p 


Exhibit 14.1. Black-Scholes calculations. 


Table 14.1. Formulae used in the Black-Scholes spreadsheet. 


Cell Formula 


E4 -B9*SQRT(B5) 

E5 =LN(B4/B6) 

E6 =(E5+((B7-B8) *B5)+0.5*E4*E4) /E4 
E7 =E6-E4 

E8 =NORMSDIST (E6) 

E9 =NORMSDIST (E7) 

E10 =B4*EXP(-B8*B5) 

E11 =B6*EXP(-B7*B5) 


B12 =E10*E8-E11*E9 
B13 =E11*(1-E9)-E10*(1-E8) 


The general Black-Scholes put formula is therefore 


PBs(S, T, X,r,d, o) = Xe"! N(—d2) — Se dl N(—di) (14.11) 
as N(—d) = 1 — N(d). Also the hedge ratio is negative and given by 
d 
a = —e-47 NC dy). (14.12) 


14.3.4 Examples 


Exhibit 14.1 shows the calculation of Black-Scholes option prices when A = 
6000, T = 0.25, X = 6200, r = 0.06, g = 0.03, ando = 0.15. The call and put 
prices are then c = 114.3 and p = 266.8. Table 14.1 provides the Excel formulae 
for these calculations. The entire calculation for an option price could be done 
using one very long formula, but then it is easy to make mistakes. Alternatively, 
an Excel VBA user-defined function could be used. 
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Figure 14.1. Option prices as S varies. 
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Figure 14.2. Call option prices as o varies. 


Figure 14.1 shows how the formula prices vary as S varies, with all the other 
inputs as stated above. The first derivatives of the two functions on this figure are 
the hedge ratios defined by (14.9) and (14.12). 

Figure 14.2 shows the call price c as a function of o when S = 6000, T = 0.25, 
r — q — 0.04, and X is either 5500 (dashed curve), 6000 (continuous curve), or 
6500 (dotted curve). It can be seen that the call price is almost a linear function 
of the volatility when X equals S. 


14.3.5  Risk-Neutral Pricing 
In the real world of risky assets, the asset price dynamics are 
dS = uSdt --oSdW (14.13) 


and we refer to the real-world probability measure P. The measure P is used 
to calculate the probabilities of events and the expectations of random variables 


376 14. Option Pricing Formulae 


in the real world. For example, the P-expectation of Sr equals S exp(uT), as 
log(Sr) — log(So) is normal with mean (u — 50°)T and variance o?T when the 
probability measure is P. 

The drift rate u is irrelevant in the Black-Scholes framework. This means that 
the market price of risk for the underlying asset is irrelevant. Consequently, the 
theoretical price of an option would be the same if the option and the underlying 
asset were both traded in a fictitious risk-neutral economy (Cox and Ross 1976). 
A different probability measure Q is used to calculate probabilities in the risk- 
neutral world. 

An option's price at time zero equals the present value of the expectation of its 
future price in a risk-neutral world. Thus 


c - eT E9[cz]. (14.14) 


Let Zo denote the risk-neutral density of Sr. Then 
oo 
c = eT E9[max(Sy — X, 0)] = PRI (x— X)fo(x)dx. ` (14.15) 
X 


These two equations for the fair option price are used several times in this book. 
In a risk-neutral world, the drift rate is determined solely by the risk-free rate 
and the dividend yield. The price dynamics are then 


dS = (r — q)$dt Log du (14.16) 


and W; is a Wiener process for measure Q. The risk-neutral density of Sr for the 
GBM process in (14.16) is the following lognormal density: 


1 ET (eec teme - zz 
Effet S 2 oT 


The integrals needed to simplify (14.15) are then 


fo) = 
x 
(14.17) 


f fo(x)dx = N(d)) and f xfo(x) dx = Se" 0T Nid. (14.18) 
X X 


The first of these integrals is simply the risk-neutral probability of the event 
Sr > X, while the second can be evaluated by making the substitution y = log(x). 
Inserting the above expressions for the integrals into (14.15) gives us the Black- 
Scholes call price formula. 


14.3.6 Approximations 


The Black-Scholes formulae can be simplified by using the forward price F 
instead of the underlying asset price S. From (14.1), Sexp(—qT) = F exp(—rT) 
and so 

cgs e "T [FN (d1) - XN (dı - oVT)] 
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with 
F log(F/X) + 50°T 
p= . 
oT 


The cumulative function N (d) can be approximated by the first two terms in a 
series expansion when d is near zero, thus: 


(14.19) 


" j 1 d 
N(d) = N(0) + dN' (0) = 2 + SC (14.20) 
This approximation implies that the formula price is almost a linear function of 
dı when both dı and dı — o 4T are close to zero. This occurs in particular for 
at-the-money (ATM) options, defined by X — F. ATM options have d; — Zo df 
and d; = —d;. Assuming o VT is not large, the approximate price of an ATM 


call option is 
~ T —rT 
cps X | — Fe ' o. (14.21) 
27 


By put-call parity, the same approximation is valid for ATM put options. ATM 
option prices are therefore approximately linear functions of o when o VT is not 
large. 


14.3.7 Deterministic Volatility 


The assumption of constant volatility can be relaxed to permit volatility to be a 
deterministic function of time, which might reflect intraday periodic effects. If we 
now replace o by o; in the price dynamics equations, we can redefine the quantity 
c? to be the average variance per unit time: 


1 T 
o? = sl o? dt. (14.22) 
T Jo 


The new quantity o given by (14.22) is a constant. The risk-neutral density of S7 
is still given by (14.17) and thus the Black-Scholes formulae remain applicable. 


14.3.8 American Formulae 


Numerical methods are required to price American options. There are many meth- 
ods available, including the accurate analytic approximations of Barone-Adesi and 
Whaley (1987). Assuming q is positive, their call price formula first requires an 
iterative calculation of the critical price A", above which a call option should be 
exercised at time zero. The approximate American price is then 


l = CEuropean + A(S/S*)” if S < S*, 
CAmerican =s_x if S > S*. (14.23) 
The terms S*, A, and y are functions of all the input parameters except S. Their 
formula for the approximate American put price has a similar structure. 
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14.4 Implied Volatility 
14.4.1 Definition 


All the inputs to the Black-Scholes formulae are observable except the volatility 
parameter c. Anyone using a formula to price a specific option only needs to 
choose a value for o . Conversely, any appropriate market price will reveal a value 
ofo thatequates the market and formula prices. This revealed value is of particular 
interest to us because it is a natural forecast of future volatility. The accuracy of 
such forecasts is covered in Chapter 15. 

The implied volatility for a European call option, traded at the price Cmarket, 18 
the number Ojmpliea that solves the equation: 


Cmarket = CBS(S, T, X, r, q, Oimplied)» (14.24) 


whenever a solution exists. Note first that any solution is unique. This follows 
from the fact that cgs increases when o increases, keeping all other inputs fixed. 
The vega of the option is the following partial derivative (Hull 2000, p. 328): 


(em L Se-1T / T (dy), (14.25) 


do 
with $ (-) the density function of the standard normal distribution. Vega is positive 
forallo > 0, because it is the product of positive terms. Note second that a solution 
always exists whenever the market price is inside the bounds given by (14.3) and 
(14.4). This happens because the limit of cgs as o. — O is the lower bound, while 
the limit as o — œ is the upper bound. 
The implied volatility for a European put option is defined by solving 


Pmarket = pBSCS, T, X, r, q, Cimplied)- (14.26) 


A unique solution again exists when the market price is within the rational 
bounds. From the put-call parity relationship for formula prices, (14.10), firstly 
Opps/00 = Ocps/9o and secondly the implied volatilities of call and put options 
are equal whenever their market prices satisfy put—call parity (Hull 2000, p. 436). 
As observed market prices at least approximately satisfy the parity equation, we 
should expect to see similar implied volatilities for call and put options that have 
a common exercise price X and a common time until expiry T. 

An approximate formula for the implied volatility is available for ATM options. 
From (14.21) there is the proportional approximation 


~ [2r et 
Oimplied = T Cp Cmarket (14.27) 


when the exercise price X equals the forward price F. 

The implied volatility for an American option is defined by replacing the Black- 
Scholes formulae in (14.24) and (14.26) by the appropriate American pricing 
formulae. 
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Figure 14.3. The implied volatility solution. 


14.4.2 Calculation 


Suppose, for example, that Cmarket is 200 when S = 6000, T = 0.25, X = 6200, 
r = 0.06, and q = 0.03. Figure 14.3 shows how cgs then depends on c. The 
implied volatility could be found by locating the market price on the vertical 
axis and then reading off the volatility value on the horizontal axis. This gives an 
answer of approximately 22%. The exact solution requires a numerical method, 
because the required inverse function cannot be stated compactly. Excel’s Solver 
function could be used, for example, to give an implied volatility of 22.41%. 
Other numerical methods are also fast, including interval subdivision and the 
Newton-Raphson method using the derivative in (14.25). The subdivision algo- 
rithm might start by noting that the solution to our example is obviously between 
10% and 30%. Then find the formula price at the midpoint, 20%, and see that it 
is below the market price. The solution must therefore be in the shorter interval 
from 20% to 30%. Try the new midpoint, 25%, to discover that a shorter range is 
20-25% and then repeat the process until a desired level of accuracy is achieved. 


14.4.3 The Implied Volatility Matrix 


Options are traded for several pairs of the contract parameters (T, X) at any time. 
Each call option has an implied volatility, from which a matrix can be formed 
with rows defined by T and columns defined by X. There will be a similar matrix 
for put options. We might expect identical numbers in a matrix if traders believe 
and act on the assumptions of the Black-Scholes formulae. This does not occur in 
empirical matrices, so that a unique volatility forecast is not provided by a matrix 
of option prices. 

Table 14.2 is a typical example, for the settlement prices of FTSE 100 options 
on 4 March 2002. The lives T of the options range from 18 to 200 days. The 
matrix terms are the averages of call and put implieds. On the illustrative day they 
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Table 14.2. A matrix of implied volatilities. The tabulated numbers are percentage implied 
volatilities for FTSE 100 options on 4 March 2002. The settlement price of March futures 
was 5241 on this day. 


Exercise price 
Expiry month 4825 4925 5025 5125 5225 5325 5425 5525 


March 20.5 182 15.7 145 13.9 13.2 13.7 144 
April 22.00 20.4 189 17.7 169 163 15.9 15.6 
May 21.3 200 192 184 177 171 166 16.6 
June 21.0 200 190 186 178 172 169 165 


September 209 204 199 195 189 185 182 17.6 
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Figure 14.4. FTSE implied volatilities on 4 March 2002. 


generally increase as T increases and decrease as X increases. This can also be 
seen from Figure 14.4. 

The variation among empirical implied volatilities at any moment in time is 
explained by data limitations, microstructure effects, and the models used to set 
prices. Implieds contain measurement error if the inputs are not contemporaneous, 
in particular if Cmarket and S are not measured at the same time. This occurs when 
options and the underlying asset trade at different times. It also occurs when there 
are stale prices in a spot stock index. Even when individual implieds come from 
contemporaneous prices, the entries in the matrix may not be for identical times. 
Microstructure effects arise because options have discrete bid and ask prices, 
which define different implieds. There will be variation in the matrix entries 
whenever some option trades are at the bid and others are at the ask. 

Even if we are able to eliminate measurement error and microstructure effects 
from the implied volatility matrix we will still find significant variation among 
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Figure 14.5. Sterling/dollar implied volatilities. 


the implied volatilities. Traders are smart people and they know that asset prices 
do not follow a GBM. Consequently, when they value options they consider more 
complicated dynamics for asset prices, which leads to more complicated pricing 
formulae and variation within the implied matrix. Their formulae may well include 
premia for risks that cannot be hedged. For example, equity implieds can reflect 
fears that the stock market may crash. 


14.4.4 Term Effects 


A term structure of implied volatilities is given by varying the time T for a fixed 
level of either the exercise price X or the ratio X/F. The shape of the term 
structure generally reflects expected changes in future volatility. As volatility 
appears to be a mean-reverting process, the same characteristic often appears in 
the term structure. Thus the term structure usually declines as T increases when 
short-term implieds are high, and conversely when they are low. 

Figure 14.5 shows daily time series of one-month and three-month ATM implied 
volatilities for the sterling/dollar rate from July 1988 to December 1998. On day n 
let oa ; and oun. 3 denote the ATM implieds when T respectively equals one month 
and three months. The one-month series, (05.1), has average 10.4%, minimum 
4.1%, maximum 24.0%, and high values around the sterling debacle in September 
1992. The three-month series, (05,3), averages 10.6% with extremes at 4.6% and 
19.0%. The spread, on,3 — gut, has a standard deviation of 0.9%. It is nearly 
always positive when oa 1 is below 8% and is nearly always negative when o; 
is above 14%. 
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Figure 14.6. Fitted implied volatility smiles for DM/$ calls. 


Studies of the dynamics of the term structure include Stein (1989) for S&P 
100 options, Heynen, Kemna, and Vorst (1994) for Dutch equity options, Xu and 
Taylor (1994) and Campa and Chang (1995) for FX options, and Byoun, Kwok, 
and Park (2003) for FX and S&P 500 options. 


14.4.5 Smile Effects 


Two contrasting pictures for implied volatilities have been obtained by varying 
the exercise price X for a fixed time T. First, currency options generally pro- 
duce a U-shape, or "smile," with the minimum implied near the forward price. 
Second, equity index options generally give a steady decline in implieds as X 
increases, sometimes called a “smirk.” These shapes can be attributed to the 
demand for options and/or to traders incorporating assumptions about the distri- 
bution of asset prices into option prices (Bates 2003; Bollen and Whaley 2004). 
Stochastic volatility increases the kurtosis of multi-period returns, while any neg- 
ative correlation between volatility and prices shows up in negative skewness as 
noted in Section 11.9. Negative correlation is a feature of stock dynamics and 
explains why stocks have a different smile shape to foreign exchange. A mathe- 
matical discussion of smile shapes follows in Sections 14.5 and 14.6. 

Figure 14.6 first appeared in Taylor and Xu (1994a) and shows the result of 
regressing a ratio of implieds for DM/$ calls, Cimplied X) /Gimptiea (F), against 
the moneyness variable defined by log(F/ X). The symbols show fitted values 
for various T. The U-shapes are clear and they flatten as T increases. Further 
U-shapes can be seen in Campa, Chang, and Reider (1998). 

Figure 14.4 shows the typical equity shape for a single day of FTSE 100 options. 
The general decline in equity implieds as X increases is emphasized by Rubinstein 
(1994). His Figure 2 for a typical day has an S&P 500 index implied of 24% 
when the ratio X/S equals 0.84, but the implied is only 14% when the ratio is 
1.08. Within this range there are a further twelve exercise prices and the implied 
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volatility is a monotonic decreasing function of X. Further equity pictures can 
be seen in Dumas, Fleming, and Whaley (1998), Duffie et al. (2000), Ait-Sahalia 
(2002), and Pan (2002). 


14.4.6 Implied Surfaces 


A three-dimensional picture of all the implieds plotted against T and X provides 
a surface plot. Models for the dynamic behavior of the surface and illustrative 
examples can be found in Skiadopoulos, Hodges, and Clewlow (1999), Tompkins 
(2001), Cont and da Fonseca (2002), and Panigirtzoglou and Skiadopoulos (2004). 


14.4.7 Implied Indices 


There are several ways to combine implieds into a single representative number. 
An important example is the VIX index of Fleming, Ostdiek, and Whaley (1995), 
which represents a hypothetical American option on the S&P 100 index that is 
at-the-money and thirty calendar days from expiry. Four call and four put implieds 
are weighted to define the index, using pairs of exercise prices that bracket the 
asset level combined with the nearby and next expiry dates. A new VIX index is 
based upon a model-free variance expectation, given later by equation (14.44). 


14.5 Option Prices when Volatility Is Stochastic 


Option prices can be calculated for continuous-time stochastic volatility models, 
providing we make some assumptions. We now suppose there are no jumps in 
the price and volatility processes, show how option prices can be calculated, and 
then consider the interpretation of implied volatility when volatility is stochastic. 


14.5.1 Theoretical Framework 


The theory of stochastic volatility option pricing was developed by Hull and 
White (1987), Johnson and Shanno (1987), Scott (1987), and Wiggins (1987). 
The mathematics and economic theory are presented in detail in books on this 
subject by Fouque, Papanicolaou, and Sircar (2000) and Lewis (2000). We follow 
the analysis in the two books and in Hull and White (1987, 1988). 

The two state variables, defined by the asset price S; and variance V; = o2, are 
supposed to have real-world dynamics defined by a pair of diffusion equations 


dS = uSdt --oSdW (14.28) 
and 
dV =adt+ndZ (14.29) 


with correlation o between the Wiener processes {W;} and (Z;], often referred 
to as the correlation between dW and dZ. We assume the correlation is strictly 
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between —1 and 1. The terms ju, a, and r are typically general functions of V 
and a parameter vector 0. For the real-world volatility process we can then write 


dV =a(V, 6p) dt + (V, Op) dZ. (14.30) 


An important example is the square-root volatility model, defined for real-world 
volatility by 
dV = (ap — bpV) dt 4-£p/V dZ (14.31) 


with three positive parameters ap, bp, and &p. 

The market is not complete because there are two state variables and only one of 
them is traded. Consequently, risk-neutral pricing methods alone cannot be used 
to obtain a unique option price. Instead there are an infinite number of equivalent 
probability measures Q, each reflecting a different risk-premium function for 
volatility, for which discounted prices, exp(—(r — q)t)S;, are a martingale. It is 
standard to assume the price dynamics for measures P and Q are provided by the 
same family of diffusion processes. The risk-neutral process is then 


dS = (r — q)$dt +oSdW (14.32) 
and 
dV =a(V, 09) dt + (V, 0p) dZ, (14.33) 


with the same correlation o between the Wiener processes {W,} and {Z t} as for the 
real-world process. Note that only the drift rates have changed, from u(V, 0p)S 
to (r — q)S and from a@(V, 0p) to a(V,09). For the square-root example the 
risk-neutral volatility dynamics simply become 


dV = (ag — bo V) dt + Ep V dZ. (14.34) 


European option prices are the discounted expected payoffs at expiry, using the 
density function for Sr defined by measure Q and denoted by fg. Thus the fair 
price for a call is 


c = eT EL [max(Sr — X, 0)] = ger (x— X)fo(x)dx. ` (14.35) 
X 


This equation is identical to (14.15), but now the density fg is not lognormal. 


14.5.2 Volatility Risk 


We discuss the market price of volatility risk before explaining how option prices 
can be calculated. The change in the volatility drift from the real-world dynamics 
to the risk-neutral dynamics is, in theory, determined by the pricing of the risk 
accepted when holding an asset whose value will change if volatility changes. 
This risk premium can be time-varying and the assumption that it has a form that 
allows us to retain the function a (V) in (14.33) is merely pragmatic. 
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There are three common ways to deal with the risk premium issue. The first 
assumes 0p = Oo and hence the premium is zero. This is often stated as an 
assumption that volatility is uncorrelated with aggregate consumption (Hull and 
White 1987, p. 283). The parameters Go can then be estimated from historical 
data, i.e. from a times series of underlying asset prices. The second way assumes 
that we can retain the function «(V). Some parameters can then be obtained 
from historical data (e.g. Er in (14.34)), but others (e.g. ag, bg) must be inferred 
from option prices. The third way commences option pricing with the risk-neutral 
dynamics, without taking any position about the volatility risk premium. All the 
parameters 69 are then inferred from a panel of option prices. 


14.5.3 Calculations 


Stochastic volatility option prices can sometimes be calculated from “closed- 
form” equations. This can be done for the square-root volatility model and some 
more complicated specifications, as discussed in the next section. Other speci- 
fications, such as the continuous-time version of the standard SV model men- 
tioned in Section 13.4, have no known “closed-form” and option prices must be 
obtained by Monte Carlo methods. In the most general situation, when o Æ 0, 
equations (14.32) and (14.33) must then be rewritten in discrete time and sim- 
ulated. We could obtain asset prices S; r at time T from simulations labeled by 
i —1,2,..., N. The integral in (14.35) would then be estimated by 


rT N 
y= = Le - X,0). (14.36) 


However, the same accuracy can be obtained much faster by using variance reduc- 
tion methods, such as the antithetic and control variate methods (Hull and White 
1987; Boyle, Broadie, and Glasserman 1997). 


14.5.4 The Zero Correlation Special Case 


The fair price of a European call option has a simpler representation when the two 
Wiener processes (W;) and {Z;} are independent and any risk premium function 
is also independent of {W;}. These are reasonable assumptions for some assets, 
particularly foreign exchange. Hull and White (1987) show the fair price is an 
expectation across Black-Scholes prices. Let 


- 1 rf 
sl V, dt (14.37) 
T Jo 


be the average variance during the life of the option, with g(v) its density function 
for the risk-neutral dynamics conditional on the initial value Vo. Also let cgs (v/v) 
now denote the Black-Scholes price when the volatility rate is ./v. Then the 
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Hull-White price is 
— oo 
c = EP[cas( V V) | Vo] = f cgs(vv)g(v) dv. (14.38) 
0 


This integral can be evaluated by Monte Carlo methods but only V; then needs to 
be simulated. 


14.5.5 Implied Volatilities for the Special Case 


Option prices from a stochastic volatility model define implied volatilities in 
exactly the same way as do observed market prices. These SV implieds help us 
to see how SV option prices differ from Black-Scholes prices and can partially 
explain the general smile shapes observed in empirical implieds. 

The simplest case is when p = 0 and (14.38) applies. The simplest approxi- 
mation then notes that at-the-money BS option prices are almost proportional to 
volatility, when o A/T is not large (equation (14.21)), so that cgs and E 2 can be 
interchanged in (14.38) to give 


c S cgs(ELIV Ý | Vo). (14.39) 


This result indicates that implieds are approximately the risk-neutral expected 
value of VV. We may therefore say that an implied volatility is approximately 
the market's expectation of volatility during the lifetime of the option if these 
three conditions all apply: 


(i) there is no volatility risk premium; 
(ii) there is no correlation between price and volatility shocks; 
(iii) the option is approximately at-the-money. 
More accurate approximations when o = 0 are obtained by series expansions. 
Let ux and o, be the mean and standard deviation of the risk-neutral random 
variable V, for a fixed value of T and conditional on a specific value for Vo. Taylor 


and Xu (19942) and Ball and Roma (1994) both show that the approximate square 
of the implied volatility is 


3. 2 

di xs qe. pap EY pcnitus (14.40) 

implied = Mx AT u2 og F Mx qt Mx |. ; 
* 


The quadratic function of log(X/ F) obviously generates a U-shaped smile as X 
varies. The exact minimum of the smile occurs when the exercise price X equals 
the forward price F (Renault and Touzi 1996) and then the implied is less than 
the square root of the conditional expectation of V (Hull and White 1987). 
Taylor and Xu (1994a) also provide an approximation for the ratio of two 


implieds, 

; x X 2 X 2 

Dental .. , Ee : Del )) (14.41) 
Cimplied (F) 8T u? F 
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Figure 14.7. Theoretical ratios of implieds for a square-root process. 


The ratio depends on T, Vo, and the process defining Vz. It decreases towards 1 as 
T increases, when either V; or its logarithm follows a mean-reverting Ornstein- 
Uhlenbeck (OU) process. Empirical analysis of several years of FX implieds 
finds smiles whose magnitudes are very approximately double the size predicted 
by (14.41). Figure 14.6 shows the fitted relationship between the ratios and M — 
log(F/ X), from a regression of ratios on M/A/T, M?//T, M/T, M?/T, and 
other variables. 


14.5.6 Implied Volatilities for the Square-Root Process 


Itis much more difficultto find general results about implieds when the correlation 
p is not restricted. The only well-known results are for the square-root volatility 
process given by (14.34). Hull and White (1988) present option prices as infinite 
series in the “volatility of volatility" parameter E. previously called £p. Then 


c = cgs(V E2[V]) + fié + pE? +- (14.42) 


for functions f; of the variables ao, bo, p, Vo, r, q, T, and X. The first three terms 
of the series shown above provide an accurate approximation to the exact solution 
forrealistic parameter values. Taylor and Xu (1994b) use Hull and White's approx- 
imate formula for c to derive approximations for implied volatilities, including 
the following ratio: 


Oimplied (X) 


= 1+ p&A, log(X/F) + (A2 + Asp))£?log(X/F))?) (14.43) 
Oimplied (F) 


with each of A1, A2, and A3 being positive functions of the variables ag, bo, Vo, 
and T. This formula shows implieds are approximately a quadratic function of X, 
but now the minimum implied will not occur at the forward price when p Æ 0. 
The minimum implied is in fact far from F, for realistic negative values of 
p, when the exact implieds are calculated from the exact option pricing formula 
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of Heston (1993). Figure 14.7 shows typical exact ratios of implieds for 30-day 
options, based upon parameter values estimated by Taylor and Xu (1994b) from 
the prices of S&P 500 futures. When p is either —0.35 or —0.7 the ratio is 
approximately a linear function of p over the range shown in Figure 14.7. 


14.5.7 Variance Expectations 


Carr and Madan (1998) and Britten-Jones and Neuberger (2000) show that a com- 
plete set of call option prices c(X) can be used to infer the risk-neutral expectation 
of the integrated variance until time T when the risk-neutral price and volatility 
dynamics are defined by the diffusion equations (14.32) and (14.33). From Itó's 
lemma, 

d$/S — d(log S) = 4V dt 


and thus 
T 
ze| f V; ar = 2(r — q)T — 2E9[log(Sr/S)] = —2E9[log(Sr / F)]. 
0 


The payoff log(S7) can be replicated by investing in a static portfolio that takes 
positions in option contracts. Consequently, it can be shown that 


T F oo 
Q lf fF 200 | c(X) 
E H var] = 2 H ye dX + A dX |, (14.44) 


with p(X) = c(X) + exe F) defining the prices of put options. This 
equation can provide informative volatility forecasts (Jiang and Tian 2004), which 
are discussed in Section 15.7. The equation also underpins futures trading on 
volatility at the Chicago Board Options Exchange (CBOE). 


14.6 Closed-Form Stochastic Volatility Option Prices 


Let P, Nid) and P; = N(d2) be the two probabilities that appear in the 
Black-Scholes call formula, so that 


cps = Se 17 P) Ke"! Py. (14.45) 


A call option expires in-the-money when Sr > X. This event has probability P2 
for the risk-neutral measure Q, when prices follow a GBM process. The event 
occurs with the higher probability P; for another measure Q", for which the drift 
rate of the GBM process is increased from r — q to r — q + o°. Expressions 
like (14.45) can be derived when the price dynamics are more complicated than 
GBM and, in some cases, the two probabilities can be calculated very rapidly 
by numerical methods. The resulting “closed-form” solutions can be useful for 
pricing options and hence for interpreting implied volatilities. They also provide 
a methodology for extracting implied risk-neutral densities from observed option 
prices, to be discussed at the end of Section 16.5. 
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14.6.1  Square-Root Volatility Processes 


Heston (1993) provides the first rigorous SV option pricing formula that can be 
evaluated rapidly. The risk-neutral dynamics, for measure Q, are given by 


Y,—-log($), Vi =o}, 
dY = (r - q — 1V) dt + VV aW, (14.46) 
dV = (a — bV) dt + EVV dZ, (14.47) 


and correlation p between dW and dZ. Heston proves that the fair price of a call 
option is 
c = Se 4 P1 — Xe"! p, (14.48) 


for probabilities that are given by integrals that incorporate complex-valued func- 
tions. The term P5 is derived from the characteristic function (c.f.) of Yr for 
measure Q, while Pj is derived from the c.f. of Yr for a measure Q* that is appli- 
cable after changing the drift rates in both (14.46) and (14.47). The equations 
for P, and P» are given in the appendix to this chapter. Heston (1993) provides 
numerical examples and illustrates the separate influences of the parameters & 
and p on the theoretical price c. Further examples and a discussion of return 
distributions are included in Das and Sundaram (1999). 

Several people have described the characteristic function of Yr for more general 
risk-neutral dynamics than the square-root stochastic volatility (SRSV) specifica- 
tion given by (14.46) and (14.47). They then derive probabilities and hence more 
general option prices. Bates (1996) modifies SRSV by adding a jump component 
to the right-hand side of (14.46) to define SRSVJ dynamics. He shows this can 
explain exchange rate smiles, when there are occasional large jumps. 

Bakshi, Cao, and Chen (1997) and Scott (1997) extend the SRSVJ dynamics 
to incorporate a stochastic short-term interest rate. Bakshi et al. obtain the char- 
acteristic function of Yr when r also follows a square-root diffusion process, 
as in Cox et al. (1985). Bakshi et al. assess the relative usefulness of stochastic 
volatility, price jump, and stochastic interest-rate features when pricing and hedg- 
ing options. They find that stochastic volatility is the most important feature and 
that price jumps are also useful when pricing short-term options. Subsequently, 
Bakshi, Cao, and Chen (2000) give further results for long-term equity options. 
Das and Sundaram (1999) also argue that both stochastic volatility and jump 
components are required to explain observed implied volatilities. 

Duffie, Pan, and Singleton (2000) provide an impressive general analytic treat- 
ment of valuation problems that use transformations such as characteristic func- 
tions. Their results are for so-called affine jump-diffusions, which include the 
SRSV dynamics as a special case. These affine models have coefficients that are 
linear functions of the state variables. For example, in (14.46) and (14.47) the 
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terms that multiply dr and the squares of the terms that multiply dW and dZ are 
all linear functions of the state vector (Y; , V;)'. Their framework permits jumps in 
volatility (which can be contemporaneous with jumps in the price) and multiple 
factors in the volatility process, in addition to all of the other features already 
mentioned. Applications include the pricing of exotic options. They illustrate the 
match between theoretical and observed option prices when volatility is stochas- 
tic, for various jump specifications. 


14.6.2 Other Processes 


Zhu (2000) also provides a detailed analysis of closed-form option pricing formu- 
lae derived from characteristic functions. Results are given for a variety of price 
dynamics, including an Ornstein-Uhlenbeck (OU) diffusion process for volatility. 
The OU analysis supplements option pricing studies for this process by Stein and 
Stein (1991), Ball and Roma (1994), and Schóbel and Zhu (1999). Closed-form 
solutions are also available when the OU process for volatility has increments 
defined by a jump process, as shown by Barndorff-Nielsen and Shephard (2001). 
Further pricing results for their jump process are given by Barndorff-Nielsen, 
Nicolato, and Shephard (2002) and by Nicolato and Venardos (2003). 

A general theoretical framework for “closed-form” option prices is presented 
by Carr and Wu (2004). They obtain option prices for time-changed Lévy pro- 
cesses using characteristic functions, and describe several examples for asset price 
dynamics that include stochastic volatility and/or jumps. Empirical comparisons 
of some of their option pricing formulae are provided by Huang and Wu (2004). 


14.6.3 Empirical Comparisons between Real-World and Risk-Neutral Dynamics 


Various model parameters are theoretically identical for the real-world measure 
P and the risk-neutral measure Q, as explained in Section 14.5. For the Heston 
(1993) model, the theoretically identical parameters are the “volatility of volatil- 
ity" & and the correlation p between the price and volatility differentials. Some 
researchers have shown that time series (P) and option price (Q) parameter esti- 
mates are inconsistent, which implies the asset price dynamics are mis-specified, 
while others have attempted to use time series and options data to jointly estimate 
parameters. Bates (2003) reviews and discusses this literature. 

Bates (1996, 2000) finds that the Heston dynamics are mis-specified for the 
DM/$ exchange rate and for S&P 500 futures prices, because the parameter E 
implied by option prices is much more than the time series estimate. Consequently, 
he investigates more complicated price dynamics that includes jumps. Bakshi et 
al. (1997) use S&P 500 spot and options prices to obtain the same conclusion about 
E. Their estimates of the correlation p obtained from panels of option prices are 
substantial and negative (below —0.55), which contrast with less negative values 
(around —0.25) obtained from time series of asset returns and changes in implieds. 
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Jointestimation of real-world and risk-neutral dynamics has been accomplished 
for the Heston dynamics and various extensions using S&P index data, usually 
for the 500-share index, commencing with Chernov and Ghysels (2000). They 
find that time series information does not enhance option pricing. Pan (2002) 
uses joint data to estimate the risk premium for price jumps and finds that it is 
correlated with volatility. Jones (2003) generalizes the volatility dynamics in two 
ways: first the volatility differential dZ is multiplied by £V” instead of E V and 
second the correlation o is made a function of V. Eraker (2004) obtains a more 
satisfactory description of spot and options data by including volatility jumps in 
the price dynamics. 


14.7 Option Prices for ARCH Processes 


Less attention has been given to option pricing for discrete-time ARCH processes, 
in comparison with research for continuous-time stochastic volatility processes. 
An attraction of ARCH methods is that observed prices for the underlying asset 
can be used to select the price dynamics. However, the discrete-time context of 
ARCH methods complicates theoretical analysis and excludes an independent 
volatility risk premium from the option pricing solutions. 


14.7.1 Theoretical Framework 


To obtain fair option prices in an ARCH framework, using a risk-neutral measure 
Q, itis necessary to make additional assumptions. Duan (1995) and Bollerslev and 
Mikkelsen (1999) provide sufficient conditions to apply a risk-neutral valuation 
methodology. For example, it is sufficient that a representative agent has constant 
relative risk aversion and that returns and aggregate growth rates in consump- 
tion have conditional normal distributions. Kallsen and Taqqu (1998) derive the 
same option pricing solution as Duan (1995) without making assumptions about 
utility functions and consumption. Instead, they assume that intraday prices are 
determined by a GBM with volatility determined once a day from a discrete-time 
ARCH model. 

The discrete timescale is now defined by trading periods, labeled by t. At time 0 
we wish to price European options that expire at time n and we make use of returns 
r; = log(S;) — log(S;—1) that define information sets J; = Jr, e, j > 0}. Follow- 
ing the general framework of Section 9.5, returns have conditional distributions 
for the real-world measure P of the form 


P 
ry | -1 ~ NGu, hi), 


with 
F;—HpP.. 
~ i.i.d. N (0, 1). (14.49) 
Vh 


Zt = 
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Duan (1995) shows that the conditional distributions change for the risk-neutral 
measure Q to 
n Ha S N(o — 8 — H 

with 

n fecto d ig 

Zt Jh; 
The parameters p and ó now denote the one-period risk-free and dividend rates, 
such that one dollar and one share respectively grow to exp(np) dollars and 
exp(nó) shares after n trading periods. 

As in Section 14.5 for SV processes, the change in the expected return as the 
measure changes reflects the risk premium for the asset. Unlike the SV case, there 
is not an independent change in the drift rate of the volatility equation; the same 
functions h; appear in (14.49) and (14.50). 

The theoretical fair price of a call option is once more given by the present 
value of its expected terminal price; thus, 


iid. N (0, 1). (14.50) 


c 2 e?" EO[c, | Ig] = e" 07 E9[max(S, — X) | Io]. (14.51) 


This expectation can be evaluated by Monte Carlo methods after specifying the 
functions jz, and h;. A closed-form expression for the expectation is only known 
for a contrived ARCH specification given by Heston and Nandi (2000). Price and 
volatility shocks are perfectly correlated for the diffusion limit of their discrete- 
time specification, unlike the general correlation permitted in the related SV 
closed-form formula of Heston (1993). 

The conditional market price of risk for the underlying asset is 


ji — (o — 8 — zhi) 
Kn 
Following most implementations of ARCH option pricing, we now assume that 


this quantity is some constant A. From (14.49) and (14.50), the standardized 
returns for the two measures are then related by 


eS (14.52) 


zi — Z À. (14.53) 
14.7.2 GARCH(, 1) 


The functions jz; and h; are as follows for the popular GARCH(1, 1) model, when 
we assume a constant market price of risk: 


ty = p — 8 — thi Ah, 
and 


hi = @ + a(r 1 — iu + Bhia = œ + (az? Ba. — (1454) 
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Monte Carlo simulation of returns, using measure Q, is achieved by knowing hy 
and then evaluating 


& ~ NO, 1), 
n =p — ô — thi thi, (14.55) 
igi = @ (o£ — XY + B)h;, 


for 1 < t < n. The asset price at expiry is $, = Soexp(ri +--+ + r4). For N 
simulations that deliver terminal asset prices {S;n, 1 < i < N}, the fair option 
price can be estimated by 


e On 
N 


C= 


N 
) max(Sin — X, 0). (14.56) 
i=] 


The efficiency of the estimate can be improved by standard antithetic and control 
variate methods (Duan 1995; Taylor 2002). Alternative computational method- 
ologies are provided by the series approximation of Duan, Gauthier, and Simonato 
(1999) and the efficient lattice algorithm of Ritchken and Trevor (1999). Duan 
(1995) discusses numerical examples in his paper that can be used to check imple- 
mentations of his option pricing formula. 


14.7.3 Long Memory ARCH 


The evidence for long memory effects in volatility, noted in Sections 10.3, 11.10, 
and 12.9, motivates incorporating these effects into option pricing methods. Bol- 
lerslev and Mikkelsen (1996, 1999) apply the theoretical framework for ARCH 
option pricing to the FIEGARCH(I, d, 1) specification. This specification is 
defined by equation (10.17) and option prices are obtained by using (10.19) and 
(14.53) to define the risk-neutral dynamics. Bollerslev and Mikkelsen's method- 
ology is followed in Taylor (2002), from which we obtain the illustrative results 
for S&P 100 index options shown in Figures 14.8—14.10. These figures summa- 
rize features of the ten sets of implied volatilities when ARCH option prices are 
calculated on the final trading days of the ten years from 1989 to 1998 inclusive. 

The first two figures are for the EGARCH(1) special case when d = y = 0, 
with persistence parameter A — 0.982, and asymmetry parameter 


y—v 
zi m 
estimated from daily returns. This special case is a short memory specification 
and all the relevant information in the history /o is provided by the conditional 
variance br. Figure 14.8 shows the volatility term structures for ATM options with 
lives up to two years. The shapes are monotonic and commence at values given by 
hy. Figure 14.9 shows the corresponding “smiles” for three-month options when 
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Figure 14.8. Ten volatility term structures for a short memory process. 
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Figure 14.9. Ten smile shapes for three-month options and a short memory process. 


the index series is rescaled so that Sọ = 100 when the options are priced. These 
skewed shapes are a consequence ofthe strong asymmetry in the volatility process. 
Figure 14.10 shows the corresponding term structures for the long memory case 
when d = 0.4. These shapes depend on the entire history 7o, which has been 
shortened to the previous 1000 daily returns. Some of the long memory shapes 
are not monotonic, there are some intersections, and the convergence to a limiting 
value is extremely slow. The "smile" shapes for the short and long memory cases 
are very similar and these functions of the exercise price X are almost parallel to 
each other. 


14.8 Summary 


Rational option prices depend on the stochastic process followed by the price 
of the underlying asset. Option prices can be derived by a risk-neutral valuation 
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Figure 14.10. Ten volatility term structures for a long memory process with d = 0.4. 


methodology for many stochastic processes, including the interesting cases when 
volatility is itself stochastic. Stochastic volatility explains why the Black-Scholes 
formula can only approximate observed option prices. It also explains the general 
forms of the patterns seen in empirical implied volatilities when they are viewed 
as functions of either the exercise price or the remaining time until an option 
expires. 
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14.9 Appendix: Heston's Option Pricing Formula 


The option pricing formula of Heston (1993) requires two calculations of the 
probability that Yr = log(S7) exceeds log(X) when the state vector (Y;, VH has 
initial value (Yo, Vo)' and dynamics 


dY = (R - uV)dt 4- VV dW, (14.57) 
dV = (a — cV) dt + VV dZ. (14.58) 


The terms a, c, R, u, £ are parameters, as 1s the correlation p between the Wiener 
processes {W,} and {Z;}. 
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The first probability is P», obtained for the risk-neutral measure Q when R = 
-i and c — b, which gives the price dynamics in (14.46) and 
(14.47). The other probability in the option pricing formula (14.48) is Pı given 
by the measure Q* that is applicable when R =r — q, u = 1, and c — b — pé. 

Probabilities are obtained from the conditional characteristic function of YT, 


here denoted by g(@) and defined for all real numbers d. With i = 4/—1, 


rd, u= 


g(~) = Ele ?!7 | Yo, Vo], (14.59) 
which is a complex-valued function. Heston (1993) solves PDEs to obtain 
Sid) = exp(C + DVo + if Yo) (14.60) 


with C and D calculated from long equations that can be stated as 


[= e 
1—k , 


C = RTọġi + wel 2log ( 


h(1— exp(dT)) 
D = : 
SEO — kexp(dT)) 


with 


d = [(p&di — c) — E Qui — 9^), 
h=c-— p&Qi- d, (14.61) 
h 
h —2d' 
The conditional probability that Yr exceeds log( X) is given by a standard inver- 
sion formula: 


k= 


TT 


] qq —i$ log(X 

POr > log(X) | Yo. Vo) = 54+— f Re E SE Sal de 
2 0 ip 

(14.62) 

with Re[-] representing the real part of a complex number (Kendall, Stuart, and 

Ord 1987). The integral can be evaluated rapidly and accurately by numerical 

methods. Option prices for a set of X-values can be calculated together, as the 

same values of g(@) are required for all X. 
Equation (14.62) provides the conditional cumulative distribution function of 


ST; thus, 
F(x) = P(Sy < x | Yo, Vo) = 1 — P(Yr > log(x) | Yo, Vo). 


The conditional density of Sr is consequently 


1 oo 
fuge. +f Relexp(—i¢ log(x))g()] do. (14.63) 
TX Jo 
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Forecasting Volatility 


Several methods for forecasting volatility are reviewed in this chapter. Forecasts 
derived from option prices and intraday asset prices are of particular interest— 
they incorporate more volatility information than the history of daily asset prices 
and they provide superior predictions. 


15.1 Introduction 


Forecasts of volatility are important when assessing and managing the risks of 
portfolios that may include derivative securities. A remarkable variety of meth- 
ods have been used and the conclusions obtained often appear to be contradictory. 
This variety reflects the fact that volatility is inherently unobservable, so that fore- 
casts must be made of related observable quantities. It also reflects the increasing 
difficulty of the forecasting task as the forecast horizon increases. 

These issues have led some people to question the possibility of making useful 
volatility forecasts, particularly more than two weeks into the future, despite the 
descriptive success of ARCH and stochastic volatility models. However, much 
of the evidence that forecasts are inaccurate is attributable to the use of a very 
noisy proxy for volatility, such as squared daily returns (Andersen and Bollerslev 
1998b). Useful forecasts of both equity and exchange rate volatility can be made, 
at least one month into the future, when realized volatility is measured accurately 
using high-frequency prices (e.g. Blair et al. 2001b; Li 2002; Martens and Zein 
2004; Pong et al. 2004). 

Option prices are a source of valuable information when forecasting volatility. 
Option traders incorporate historical price information, and further information 
about future events which influences volatility, into option prices. These prices 
have the potential to provide the best forecasts when option markets are efficient 
and future volatility is the only relevant unknown variable. Consequently, this 
chapter concentrates on forecasts that make use of implied volatilities. The supe- 
rior out-of-sample accuracy of such forecasts, compared with predictions given 
by the history of daily asset prices, is now well documented (e.g. Xu and Tay- 
lor 1995; Blair et al. 2001b; Ederington and Guan 2002b). This conclusion does 
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not generally hold, however, when the asset price history is expanded to include 
intraday asset prices. 

Poon and Granger (2003) provide a comprehensive survey of recent volatility 
forecasting studies. Most studies only predict the volatility of one asset or port- 
folio. Alexander (2001) covers the more general problem of predicting variances 
and covariances, within a multivariate context. 

Sections 15.2 and 15.3 cover forecasting methodology. Of major importance 
are definitions of the volatility target that is to be predicted and measures of 
how well the predictions agree with subsequent outcomes. Methods that only use 
the history of asset prices are defined and reviewed in Section 15.4. The more 
interesting methods that also use option prices are covered in Sections 15.5-15.7. 
The construction of a forecast from the many available implied volatilities is 
first discussed, followed by low-frequency applications of regression and ARCH 
methods. These are followed by comparisons of historical and option forecasts 
of volatility, when high-frequency data are used to both forecast and measure 
volatility. 


15.2 Forecasting Methodology 


Forecasting tasks that are repeated through time use information J; known at time 
t to produce a forecast f;44 of some target quantity y;4.g that is observed at the 
later time t + H. Most volatility forecasting exercises use a daily timescale, so 
that f counts trading days. This will be assumed in this methodological section. 
Initially, it is also supposed that the forecast horizon H is simply one day. To 
simplify the discussion it is also assumed that expected returns are zero. It is easy 
to adapt the following methods when expected returns are either constant or some 
function of the information /;. 


15.2.1 The Volatility Target 


Some proxy for volatility must be forecast, because volatility cannot be observed. 
Measures of realized volatility are often employed. One popular choice is the 
squared return, so y41 = Pin. This has the theoretical advantage, explained 
later in this section, that optimal forecasts of squared returns can also be optimal 
forecasts of the unobserved squared volatility, oro However, return outliers are 
amplified when they are squared and then forecast errors are typically very large 
compared with other times. Consequently, another popular proxy is the absolute 
return, so that y41 = |r;441|. Forecasts of wi can then be scaled to deliver 
unbiased forecasts of 0+1. 

Any realized volatility measure calculated from one daily return will be a noisy 
estimate of that day's volatility. Less noisy estimates can be obtained from intraday 
prices. The daily range, 


viet = agin, — lost, (15.1) 
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calculated from daily high and low prices, Pia and p, , ;, is one possibility that 
has been used in forecasting studies (e.g. Taylor 1987). Another is the scaled 
squared range of Parkinson (1980), which is defined by equation (12.44). 
High-frequency datasets are very useful for defining and evaluating volatility 
forecasts. Andersen and Bollerslev (1998b) explain the advantages of a volatility 
target defined by the realized volatility measure of Sections 12.8 and 12.9, namely 


N 
a2 2 
yai = ôf => dg (15.2) 
k=1 


with N the number of intraday returns rett, whose sum is 7;+1. Forecasts of this 
target have been evaluated in several recent research papers, for example, Taylor 
and Xu (1997), Blair et al. (2001b), and Andersen et al. [ABDL] (2003). 

Implied volatility (IV) may be a suitable volatility target if forecasting is 
intended to enhance option hedging strategies. However, it is only possible to 
use the best forecast of IV to infer the best forecast of a latent volatility variable if 
several assumptions are made; these include a correctly specified option pricing 
formula. 


15.2.2 Information and Forecasts 


The information set I, is selected by the forecaster. The set might only contain 
a history of asset prices. Typical forecasts are then conditional variances defined 
by ARCH models. These and other time series forecasts are reviewed in Sec- 
tion 15.4. Alternatively, the set might only contain implied volatilities. Indeed, 
one option price at time ¢ might suffice to identify the optimal volatility forecast, 
assuming an efficient options market. The most interesting research studies use 
both an asset price history and implied volatilities to compare the predictive accu- 
racy of these sources of volatility information. Several examples are reviewed in 
Sections 15.5-15.7. 


15.2.3 Loss Functions 


The accuracy of forecasts is evaluated by some loss function, L(y;41, ft+1), that 
represents the penalty or cost when we predict f;+1 but the outcome is y;+1. The 
forecaster's objective is then to minimize the expected loss. Loss functions are 
often some simple function of the forecast error, ei = ei — but, that is 
motivated by statistical convenience. Two examples are the squared error, e p 
and the absolute error Je: 1|. The assumed symmetry between positive and nega- 
tive forecast errors could be inappropriate. When it is more costly to underpredict 
volatility than to overpredict it, the LINEX function can be used, 


Lin, fri) = experi) — 1 — Aen, (15.3) 
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for some positive parameter A (Granger 1999; Hwang, Knight, and Satchell 2001). 
Another statistical loss measure is the squared proportional error, 


B 2 
Lin, fi) = (===) ; (15.4) 
fii 


which is robust against heteroskedasticity in the forecast errors. 

Economic loss functions, which measure the impact of forecasting mistakes 
upon financial decisions, may be more relevant. For example, West, Edison, and 
Cho (1993) make use of utility functions, Fleming, Kirby, and Ostdiek (2001) 
relate volatility forecasts to portfolio weights, and Engle and Rosenberg (1995) 
emphasize hedging errors. 

One forecasting method is more accurate than another if its average loss is 
less. Tests of the null hypothesis that methods have identical expected losses are 
reviewed by Diebold and Mariano (1995). 


15.2.4 In- and Out-of-Sample 


Many forecasts depend on parameters. There are several, for example, in ARCH 
specifications for the next day's variance. These parameters are sometimes esti- 
mated from data spanning several years and then applied to the calculation of 
forecasts during the same time period. This is called in-sample forecasting. It 
may not deliver realistic results about forecast accuracy (e.g. Dimson and Marsh 
1990). Likewise, future information should not be used to guide the selection of a 
parametric specification. The preferred methodology is to use only data until time 
t to obtain the parameters and functional form of a forecast f;+1. The forecasts 
are then out-of-sample. The parameters may then be re-estimated once a day from 
a sample of fixed size, thus defining rolling parameter values. 


15.2.5 General Forecast Horizons 


The target quantity vn, for a general horizon of H days, could simply measure 
volatility at the future time, e.g. Ee H © Iri ul. It is much more usual, however, 
to predict some measure of the total volatility from times t+ 1 to t + H inclusive. 
This is particularly appropriate if we use the forecast to price an option that expires 
after H days. 

Typical targets are the sum of H realized volatilities. Low- and high-frequency 
examples are 


H H N 
Yt+H = KS CH and yH = SN A (15.5) 
yet j=1 k=1 


Realized standard deviations are also predicted, defined perhaps by 


H 


1 2 
map Ia Zu (15.6) 
J= 
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These are biased estimates of the theoretical quantity defined by replacing r+; by 
0;..j. The bias can be substantial when H is small but it can perhaps be ignored 
when H is at least ten (Fleming 1998, p. 324); equation (15.12) quantifies the 
bias when the horizon H is a single day. 

It is theoretically incorrect to define H day forecasts of the targets in (15.5) by 
the simple scaling rule 


Sion = Hit, (15.7) 


when volatility is a mean-reverting process. Instead, f;44 should be nearer than 
Hu to the long-run average level of vu. illustrated later by equation (15.23), 
if we disregard the possibility of future changes in this level. 

Daily observations of the H-day forecast errors, ein = Yr+H — Lin, will 
be autocorrelated, up to lag H — 1. There are H — 1 shared days in the forecast 
horizons commencing at times f and f + 1, and surprises during these days will 
have a similar impact upon the errors e;4 7 and e;4 +1. This dependence reduces 
the power of forecast comparisons. The standard errors of regression coefficients 
need to be adjusted, as in Jorion (1995) and many other studies, for example, by 
using the techniques of Hansen (1982) and Newey and West (1987). 


15.3 Two Measures of Forecast Accuracy 
15.3.1 Minimizing Mean Squared Errors 


The mean squared error (MSE) criterion is often preferred when forecasts are 
evaluated, because the conditional mean of the target is then the optimal forecast. 
For forecasts fo made at times pn € t < ft», by methods indexed by j, the 
empirical MSEs are 


l 1 x j 
Map = — 12.00 - fa (15.8) 


t=ti 


A useful measure of forecast accuracy is the proportion of the variability of the 
outcomes y;+1 explained by the forecasts, which is the linear function of MSE 
defined by 


Voss qi» 
Ort — ¥ y 
with y the average outcome. The best method for observed data has the minimum 


value of MSE(j) and the maximum value of P (j). Ideally, this will also happen 
to be the method that has the minimum value of 


Elo — f |] 


for all possible information in the sets Z. 


P(j)=1 


(15.9) 
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The theoretical problem of using J; to find the number f;4, that minimizes 
EG EE | Ij ]issolvedby fr41 = Elyr+1 | J+], whenever the conditional 
variance of y; 1s finite. The optimality of the conditional mean allows us to relate 
optimal forecasts of realized volatility y;+ı to optimal forecasts of a volatility 
variable, such as ge 
Chapters 9-11 contain several return processes that have the factorization 


ry = Ott (15.10) 
in conjunction with the assumption that u;4., has zero mean, unit variance, and is 
independent of (05.1.1, Ot, 07-1, ..., Ut, Ut—1, ... For these processes, 

E[r? | U) = Elo2., | L1E[u2,, | I] = Elof, |Z 15.11 
Leef | 4] = lofi | Ir] [uray | 4] = lof RA (15.11) 


when I; is any history of returns. Equation (15.11) remains true for daily returns 
when the history includes intraday returns that have a similar factorization. It 
is also true when 7; contains options prices, providing we make the reasonable 
assumption that these prices contain no information about u;+1. The optimal 
forecasts of y;41 = m and ger are then identical. Seeking the best forecast of 

2, is therefore a constructive way to seek the best forecast of o? 


t+1 tl 
Likewise, when y;+1 = |7;+1], there is the result 


r 


Ell 4] = Ellul] Elor | L] (15.12) 


so the optimal forecasts of |r;+1 | and 0/41 are proportional to each other; the term 
Ell, equals J/2/z when Met has a normal distribution. Forecasting o) 
has the advantage that a finite unconditional MSE is obtained when the process 
for returns has a finite variance. In contrast, the unconditional MSE is infinite for 
forecasts of Oa when the returns process has infinite kurtosis. 

A proportional relationship also exists between optimal forecasts of the price 
range (see (15.1)) and et, which is described and exploited in Taylor (1987). 
There are similar results when the forecast target is the intraday realized volatility 
Ges 


15.3.2 Correlation Measures 


Many volatility forecasting studies report the correlation R between a set of fore- 
casts f;+1 and a set of realized volatilities vc. Ranking forecasting methods by 
their values of R? is unsatisfactory, however, because the R? criterion incorpo- 
rates a lookback bias. This can be seen by remembering that the empirical R? is 
the proportion of variance explained by the best linear combination, o + b fr+1, 
le. 

R2? = max 1 DOi- a — Bf 

a,b 2.03 33" 


(15.13) 


15.4. Historical Volatility Forecasts 403 


Thus R? provides a biased measure of forecast accuracy, because the best o 
and P for a forecasting method are only known after all the forecasts have been 
evaluated. The ex post optimization of œ and f ensures that R? > P, with P the 
preferred proportion defined by (15.9). The criteria R? and P may well produce 
different ranks for a set of forecasting methods. 

One interpretation of R? is that it measures information content; higher values 
indicate more association between the volatility forecast and the volatility out- 
come. Nevertheless, a relatively high association may arise from a biased forecast 
that would lead to relatively bad decisions if it was used ex ante. Tests of the joint 
null hypothesis o = 0 and 6 = 1 are often appropriate, although rejection of 
the null may say nothing about the efficiency of markets. For example, the null 
could be rejected for an implied volatility forecast when option prices include a 
volatility risk premium, as will be discussed further in Section 15.5. 

Empirical values of R? are small when realized volatility is calculated from 
one daily return. Suppose re) = 07414741, with o;4 4 independent of Veit, and 
that u;..; has zero mean, unit variance, and finite fourth moment, ky = E [u4 41l 
Then the squared correlation for predictions of ee , is bounded as follows: 

var (a7 D 


R? = [cor(r? : SE cor(r2 d E SE 15.14 
[ ( t+1 fia) [ ( t+1 np) var(r2,,) D ( ) 


The upper bound is one-third when u;4, has a normal distribution. Low val- 
ues of R? are a consequence of Eër = ofu? 1 being a very noisy estimate of 
Ba: Much of this noise can be eliminated by instead using the intraday realized 
volatility 62, given in (15.2), and then much higher values of R? can be obtained 
(Andersen and Bollerslev 1998b). 

When returns are generated by an ARCH process, the conditional variance 
hy+1 equals EN and it defines the optimal forecast made at time t. Andersen and 
Bollerslev (1998b) derive R? for the GARCH(1, 1) model when the forecast is 
the conditional variance. With o and £ now referring to the GARCH parameters 
used in Sections 9.3 and 9.6, and providing the returns have finite kurtosis, which 
requires (o + 8)? + oa? (k, — 1) < 1, the optimal R? is 


a? 


| Qao CREE 
1 —2of — 8? 


(15.15) 


As o and f approach the boundary that defines infinite kurtosis for returns, R? 
approaches the upper bound 1/k,. 

15.4 Historical Volatility Forecasts 

15.4.1 Methods 


Historical forecasts and volatility targets are obtained from information sets I; 
that only contain asset prices known at time t. Simple benchmark forecasts of the 
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next volatility outcome y;4.; are 


fr = ye (15.16) 


1 M-1 
us g 2 yii (15.17) 


The previous value forecast, (15.16), is sometimes called a random walk forecast 
because it is optimal when changes in the target are unpredictable. The moving 
average of previous values, (15.17), depends on the single parameter M and 
common values are between 20 and 100 inclusive. The average becomes the 
historical mean when M = t, which defines another benchmark forecast. Many 
volatility forecasting studies first attempt to demonstrate that there are forecasts 
that are more accurate than one of these naive benchmarks. 

Exponentially weighted moving averages (EWMAs) have a long history in 
forecasting literature. They are defined recursively by 


t—1 


fei ys t (0-2 3 ya-yYy-ic0—Q0—y))fi. (05.18) 
i=0 


EWMASs depend on the smoothing parameter y and the initial prediction fi, with 
0 < y < 1. Such forecasts have two advantages. First, they give more weight 
to the most recent observations. Second, they are robust against changes in the 
unconditional mean level of volatility that occur over very long periods of time 
(Pagan and Schwert 1990). EWMAs are used to predict volatility in Taylor (1986) 
and in the RiskMetrics methodology developed by JPMorgan. 

The EWMA forecast is optimal when the outcomes y;4, are generated by an 
ARIMA(0, 1, 1) process and the loss function is the squared forecast error. As 
there is plenty of evidence against a unit-root in volatility during recent years, 
forecasts have been derived from ARMA(p, q) processes (e.g. Schwert 1989; 
West and Cho 1995) and, more recently, ARFIMA(p, d, q) processes with 0 < 
d « 1 (e.g. ABDL 2003; Pong et al. 2004). These forecasts are particularly 
appropriate when the target y;.1 is a high-frequency measure of realized volatility. 

ARCH models provide a vast variety of volatility forecasts. Assuming expected 
returns are a constant u, the natural volatility target is the next squared excess 
return, Maut = (41 — u). This is predicted by the conditional variance bi = 
var(r;+1 | I). An advantage of the ARCH approach is that maximum likelihood 
methods can be used to select a specification for ^; and to estimate the model 
parameters. Probably the most popular specification for forecasting purposes is 
GARCH(I, 1). Then 


hi1 = © +a(r, Ay + Bh, (15.19) 
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as has been discussed at length in Sections 9.3 and 9.4. Asymmetric specifications 
are, however, appropriate for equity markets. The GJR(1, 1) specification, of 
Sections 9.7 and 9.8, is the convenient extension of GARCH(1, 1) defined by 


hist = 04 (à 4-7 Sri — u)? + Bh; (15.20) 


with S; = 1 if r; H, otherwise A, = 0. 

So far we have only described predictions one period into the future. Now let 
H > 1 be a general horizon for more distant forecasts and suppose the target 
ven is the sum of H one-period realized volatilities, as illustrated in (15.5). 
The H-period forecast is then the one-period forecast multiplied by H for the 
benchmark and EWMA methods. This scaling rule does not apply to forecasts 
obtained from stationary ARMA or ARCH models. For a general ARCH model, 
that has a constant conditional expected return, the theoretically optimal forecast 


of 
H 


H 
9 Caj= mw? ds but EU; | L]. (15.21) 
j=l j=2 
For the GJR(1, 1) model, with symmetric conditional distributions, the average 
value of S;+1 is one-half and hence the above conditional expectations have 


Elh; | IL] 2 o (œ+ la^  B)EUu jl td, j =2,3,4,.... (15.22) 


The optimal forecast in (15.21) then simplifies to 
H 


éi 
1-6 
with $ = œ + la- + B ando? = w/(1 — $) respectively equal to the persistence 
parameter and the unconditional variance of returns. This is identical to formula 
(9.22) for the GARCH(1, 1) model, as then a~ = 0. 


fig = Ho? + (hi1 — 0?) (15.23) 


15.4.2 Comparisons of Historical Forecasts 


Numerous comparisons of the accuracy of naive, EWMA, ARCH, and other his- 
torical volatility forecasts are discussed in Poon and Granger (2003). Of particular 
interest are comparisons between multi-parameter methods (such as GARCH) and 
single-parameter methods (such as EWMA). These comparisons should avoid in- 
sample parameter optimization, because different conclusions can arise for out- 
of-sample forecasts (see, for example, Dimson and Marsh 1990; Ederington and 
Guan 2002c). 

The early study of Taylor (1986, Chapter 4) makes several comparisons between 
EWMA and GARCH(1, 1) predictions of the next daily absolute return for forty 
assets, including stocks, commodities, and currencies. Out-of-sample compar- 
isons of mean squared errors marginally favor the EWMA approach when aver- 
ages are taken across all the series. The recommended values of the smoothing 
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parameter y are 0.04 for equities and 0.1 for other assets. Both the EWMA and 
GARCH predictors are more accurate than the prior sample mean for every series. 
In contrast, Akgiray (1989) finds GARCH(1, 1) is a more accurate predictor than 
EWMA for monthly realized variances calculated from daily CRSP index returns 
between 1963 and 1986. 

There is no consensus about the relative accuracy of historical volatility fore- 
casts for equity markets. Tse (1991) and Tse and Tung (1992) prefer EWMA 
forecasts, respectively for Japan and Singapore. Brailsford and Faff (1996), how- 
ever, find EWMA is poor for the Australian market and they favor forecasts from 
the GJR(1, 1) specification. Franses and van Dijk (1996) disagree. They recom- 
mend the QGARCH specification (equation (10.7)) for five European markets and 
find that GJR is much less accurate. Heynen and Kat (1994) evaluate but do not 
recommend asymmetric specifications; instead they prefer stochastic volatility 
forecasts to GARCH(1, 1) and EGARCH(1, 1) forecasts for seven major equity 
markets. Balaban, Bayar, and Faff (2003) is the most comprehensive study and it 
covers fourteen countries. Their most accurate forecasts of weekly and monthly 
volatility, obtained from daily index returns, are given by exponentially weighted 
averages. The variety of conclusions must be a consequence of using a variety of 
markets, data frequencies, and loss functions. Many of the apparent differences 
in accuracy across methods may not be statistically significant, as there are often 
a small number of independent out-of-sample forecast errors. 

Exchange rate volatility may be easier to predict. Forecasts from the 
GARCH(, 1) specification are recommended in the study of five currencies by 
Heynen and Kat (1994). They consider horizons ranging from 2 to 100 days, with 
an out-of-sample period from 1988 to 1992. West and Cho (1995), however, find 
that a constant is more accurate than GARCH and related forecasts, when mak- 
ing out-of-sample forecasts of the squares of weekly returns from five exchange 
rates between 1981 and 1989. This negative result may be a consequence of using 
weekly observations. Taylor (1987) instead uses daily high, low, and close prices, 
which are used to define a variety of DM/$ volatility forecasts that are more 
accurate than a constant during a short out-of-sample period from 1982 to 1983. 

Several historical forecasts are compared by Ederington and Guan (2002c) 
for long daily time series of returns from US equities, the S&P 500 index, the 
DM/$ exchange rate, and US interest-rate securities. The clear winner from their 
comparisons is a linear function of the EWMA calculated from daily absolute 
returns. With f;+ı defined by (15.18), and with y+; = |rr+1l, their preferred 
forecast is essentially fei =a-+ Bf;+1. The three parameters a, 6, and y are 
estimated from data that precedes time t + 1. This forecast outperforms a similar 
construction from squared returns and a variety of forecasts defined by ARCH 
models. 
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Intraday volatility forecasting is complicated by the intraday patterns discussed 
in Section 12.5. Martens et al. (2002) compare several methods that incorporate 
these patterns into forecasts of realized thirty-minute volatility for exchange rates. 


15.5 Forecasts from Implied Volatilities 
15.5.1 Which Implieds? 


Following publication of the Black-Scholes formula, it was soon known that 
implied volatility (IV) covaries with realized volatility (Latane and Rendleman 
1976; Chiras and Manaster 1978). To produce a volatility forecast from options 
prices we must select one or more of the available prices. These prices and their 
corresponding IVs depend on the time to expiry T, the exercise price X, and 
whether the option is a call or a put. As the call and put IVs are very similar, 
when the two options have the same T and X, it is usual to average the two 
numbers. Averaging across call and put implieds reduces any measurement error 
from nonsynchronous asset and option prices, because the typical call error is 
then negatively correlated with the put error. 

The expiry time T should be matched with the horizon H of the volatility 
forecast. Many evaluations of IV forecasts first select T and then define H — T. 

Other evaluations focus on a specific short horizon, usually between one day 
and one month. Then IVs are often calculated from the options that are nearest 
to expiry. The nearest IV is, however, particularly noisy when T is only a few 
days. It is therefore common to select the nearest IV when T is at least one or two 
weeks, and otherwise the second-nearest IV is selected. Differences between T 
and H might be expected to handicap IVs in forecasting competitions that also 
involve historical forecasts. The horizon mismatch can be avoided by estimating 
a term structure model for the IVs, but this involves extrapolation when H is less 
than the least value of T. Xu and Taylor (1995) obtain similar results from a term 
structure IV forecast and the nearest or second-nearest IV. 

Smile effects in the IV matrix provide a variety of IV numbers once T has been 
chosen. This variety is often ignored by only using one value of X. The natural 
choice is then the X that is nearest to either the asset price S or the forward price F. 
These choices tend to focus on the most heavily traded options. Alternatively, some 
combination of IVs can be calculated, with the intention of reducing measurement 
error and thereby improving forecast accuracy. Most proposals give most weight 
to nearest-the-money options. Ederington and Guan (2002a) compare several 
weighting schemes for IVs calculated from S&P 500 futures options. They find 
that the choice is unimportant, providing bias is removed from the IV forecasts. 
In a related paper they average the IVs for the two nearest-the-money options 
(Ederington and Guan 2002b). The VIX index of Fleming et al. (1995) is a similar 
measure of the at-the-money IV. 
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Model-free volatility expectations are an important and recent alternative to 
implied volatilities (Jiang and Tian 2004). For a selected value of T , option prices 


for all available exercise prices X are used to approximate the variance expectation 
defined by equation (14.44). 


15.5.2. Scaling 


Implied volatilities are annualized standard deviations. The equivalent standard 
deviation over H days is given by the formula 


| H 
IVscaled = N IV (15.24) 


with N the number of days in one year. The standard convention is to count only 
trading days, since volatility is much lower when markets are closed. For a typical 
one-month forecast, H and N respectively equal 21 and 252. 


15.5.3 Interpretation of Implieds as Forecasts 


It is tempting to identify an IV forecast of volatility with the market’s expectation 
of average volatility during the forecast horizon. The variation within IVs, from 
smile effects, shows that this identification cannot be correct for all exercise prices. 
Theoretical analysis of at-the-money IVs shows that they are candidates for market 
expectations when volatility is stochastic and additional assumptions are made. 
From Hull and White (1987), an essential assumption is that volatility has no 
risk premium, as discussed in Section 14.5. This is not an innocuous assumption. 
Bakshi and Kapadia (2003) show that some equity option prices are higher than 
would occur if there was no premium, which can be interpreted as reflecting a 
negative premium. This is consistent with observed biases in IV forecasts (e.g. 
Ederington and Guan 2002a,b). There is also evidence of a time-varying premium 
in currency option prices (Guo 1998). 

A more pragmatic interpretation of an IV is that it contains all the informa- 
tion necessary to derive the market's expectation of average volatility during 
the forecast horizon. In particular, we might hope that some linear combination 
a; + B;IV, is an appropriate forecast, with œ; and B; estimated from observations 
of IV and the volatility target up to time t. This approach is followed in Blair 
et al. (2001b) and Ederington and Guan (20022). The same idea, with o; and f, 
constant through time, is implicit in many studies that rank forecasting methods 
using the correlation between volatility outcomes and forecasts. 


15.5.4 Comparisons of Implied and Historical Predictors 


Several econometric methods and data frequencies have been used to compare the 
predictive accuracy of volatility forecasts obtained from options prices with those 
obtained from the history of the asset price. We now discuss several comparative 
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studies. Low-frequency studies that use daily or less frequent price information are 
discussed under two headings. Initially, we concentrate on regression and related 
methods, followed in Section 15.6 by a detailed description of studies that are 
strongly influenced by ARCH methodology. These low-frequency studies lead to 
the conclusion that the best forecasts can usually be obtained from options prices. 
High-frequency studies are more recent and are reviewed in Section 15.7. There 
is much relevant additional information in intraday prices. Indeed, the general 
conclusion that forecasts from options prices are always best cannot be sustained 
in a high-frequency context. 


15.5.5 Regression Analysis 


Regression studies are based upon the encompassing methodology introduced in 
Fair and Shiller (1989). The model to be estimated in our context is 


Yri-H = Q + Bi fIV,t,t+H + B2 fTS,t,t+H + ĉet,t+H (15.25) 


with y the target to be predicted, fry a forecast from options prices, frs a forecast 
from a time series of returns, and e the forecast error. The forecasts are made at 
time f. Often the horizon H is the remaining life of a selected option and y is 
the realized standard deviation of returns from times t + 1 to t + H inclusive. 
Then ¢ + H is the same for several times ¢ and there is substantial autocorrelation 
among the overlapping errors e. This necessitates correction of standard errors, 
as explained by Jorion (1995), who applies the method of Hansen (1982). The 
options market contains all the relevant information about future volatility when 
P2 = O. At the other extreme, options prices are redundant when forecasting 
volatility if 8; = 0. Estimates of 6; and f» that are obtained by the ordinary 
least squares method are generally biased because of measurement errors in the 
IV forecasts. This bias is explained by Christensen and Prabhala (1998). They 
recommend an instrumental variable regression methodology that reduces the 
bias. 

Ederington and Guan (2002b) provide regression results for S&P 500 futures 
options from 1983 to 1995. They also summarize and discuss regression results 
included in Day and Lewis (1992, 1993), Canina and Figlewski (1993), Lam- 
oureux and Lastrapes (1993), Jorion (1995), Guo (1996), and Christensen and 
Prabhala (1998). Overall, the estimates of 6; are nearly always positive and reject 
the null hypothesis 6; = 0, while many estimates of 62 are near zero and accept 
the possibility 62 = 0. IV forecasts are, however, generally biased; univariate 
regressions of y against fry usually estimate 6; to be below one and reject the 
joint hypothesis that o = 0 and 6; = 1. These are all tests about information 
content, rather than out-of-sample accuracy, as emphasized after equation (15.13). 

Jorion (1995) investigates forecasts of exchange rate volatility, for DM/$, SF/$, 
and yen/$ rates from 1985 to 1992. He first regresses daily absolute returns against 
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implieds, a historical moving average, and a GARCH(1, 1) predictor. The values 
of R? and hypothesis tests lead to the conclusion that implieds contain all the 
relevant information. The DM/$ estimates are fi — 0.785 and po = 0.085 for 
the GARCH forecasts, with R? = 0.052. He then regresses the realized stan- 
dard deviation during the remaining lifetimes of the options against the same 
explanatory variables and obtains the same conclusions. 

The empirical evidence for US equity indices mostly favors implied predic- 
tors. Canina and Figlewski (1993) assert that there was virtually no correlation 
between implied volatility and subsequent realized volatility for the S&P 100 
index, from March 1983 to March 1987. This conclusion does not hold after the 
1987 crash, so it might just refer to an unusual period. Errors in the measurement 
of implied volatilities and problems that stem from overlapping regressions are 
the preferred explanations of Christensen and Prabhala (1998). Their analysis of 
S&P 100 data from 1983 to 1995 finds implieds outperform historical forecasts 
and often contain all relevant information, particularly after the crash. Similar 
conclusions are given by Ederington and Guan (2002b) for the same years, but 
for S&P 500 futures options. For predictions of the realized volatility during the 
period from seven to ninety days before expiry they estimate fi — 0.515 and 
Bo = 0.034, after excluding three months around the 1987 crash. Fleming (1998) 
uses the generalized method-of-moments methodology to lend further support to 
the superior information content of implied forecasts of S&P 100 index volatility. 
He finds implieds are biased predictors during his sample period from 1985 to 
1992 that excludes the months around the crash. The forecast errors from implieds 
are almost uncorrelated with historical predictors over horizons equal to one day, 
one month, and the lifetimes of options. This contrasts with significant correlation 
between implieds and the forecast errors made by historical predictors. 


15.6 ARCH Forecasts that Incorporate Implied Volatilities 


Volatility forecasts can be obtained by estimating ARCH models from the infor- 
mation provided by asset returns and implied volatilities. A forecast for the next 
period is defined by the conditional variance of the next return and forecasts fur- 
ther ahead can also be derived. At time £ the relevant information set is defined 
as I, = (riii, vj i, i > 0), with v, a variable obtained from one or more implied 
volatilities. Often v; is the scaled implied for an option that is approximately 
at-the-money and near to expiry. The scale factor defined in (15.24) ensures v; is 
a volatility measure for one trading period, rather than for one year. Following 
Day and Lewis (1992), we now show how to include implied volatilities in ARCH 
models, then we assess the volatility information content of option prices, and 
finally we describe the relative accuracy of various forecasts derived from ARCH 
specifications. 
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15.6.1 Variance Specifications 


The general ARCH model described in Sections 9.5 and 9.6 is defined by condi- 
tional distributions: 


re | [ia Div: ht). (15.26) 


The distribution D has conditional mean u; and conditional variance h;. The 
distribution may be normal, or it may be fat-tailed and a function of a shape 
parameter. To simplify our presentation of some specifications for ^; we suppose 
Ur is constant and henceforth denoted by u. 

The popular GARCH(1, 1) specification can be modified to incorporate the 
additional implied volatility variable v; as follows: 


hi = o--a(rii— W)? + Bhia + ôv? 4. (15.27) 


The parameter vector 0 of the above specification contains u, a, B, 5, œ plus any 
shape parameter. With L the lag operator introduced in Section 3.5, 


(1 - BU = o + erii — u)? àv? 


and hence 


oo 
hy = M BL (o atrii — uy v?) 
i=0 
E OI 
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Consequently, the weights for the lagged options terms decay at the same rate as 
those for the lagged squared excess returns. This is an unnecessarily restrictive 
assumption. A more flexible specification is 


o tot HN ôv] 
(298 [EI 


hy = (15.28) 
which requires an additional parameter, 6,. To evaluate (15.28), note that expres- 
sions of the form y, = x;/(1 — BL) can be evaluated as y, = By,-1 + after 
assuming an initial value y1. 

Asymmetric specifications, such as the GJR-GARCH and the EGARCH models 
in Sections 9.7 and 10.2, can also be modified to include implied volatilities by 
adding an additional term to the definition of h;. For example, Day and Lewis 
(1992) extend EGARCH(1) by changing equation (10.1) to 


log(t;) = Misst + A^(logÜti-1) — Hiogin)) + gGi-1) + ô log(v? ,). (15.29) 
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15.6.2 Information Content: Theory 


Parameter estimates, standard errors, and hypothesis test results are obtained by 
using the likelihood methods for ARCH models described in Sections 9.5, 9.6, 
and 10.4. 

Tests for information content are of particular interest. There is no incremental 
volatility information in the options prices when a suitable constraint on the 
parameter vector defines the best model. For example, the options information is 
ignored when ó — 0 in any one of equations (15.27)-(15.29). This null hypoth- 
esis can be assessed by estimating 6 and an appropriate standard error, followed 
by calculating t — h /s.e.(ô). Rejection of the null is equivalent to statistically 
significant incremental information. 

The other extreme, as regards information content, is when all the relevant 
volatility information is to be found in option prices. This could be stated as 
a = 0 for the modified GARCH(1, 1) model, which allows h, to depend on 
Ur—2, Uj—3, ... as well as on the latest implied v,;_;. At an efficient options mar- 
ket, v;.. will contain all the relevant information when measurement errors are 
negligible, so the null hypothesis is often stated as a = 6 = 0. There is then no 
incremental information in asset returns when the null hypothesis is true. These 
and similar hypotheses can be decided by likelihood-ratio tests. 

Note that the tests do not assess forecast accuracy, because they are "in-sample." 
Parameter estimates obtained by maximizing the likelihood function, for data at 
times 1 < t < n, are only known at time n when the data end. The estimated 
parameters will define ex ante forecasts if they are used to predict volatility for 
times t > n. 


15.6.3 Information Content: Results 


We now review the results of five empirical studies published between 1992 and 
1996, which provide results for equities, commodities, and currencies using daily 
and weekly data. These studies indicate that implied volatilities are generally rich 
in relevant information. More recent studies also use high-frequency data and are 
reviewed in Section 15.7. 

Day and Lewis (1992) estimate specifications that are similar to the modified 
GARCH(I, 1) and EGARCH(1) models, (15.27) and (15.29), with conditional 
normal distributions. Their data are weekly returns (in excess of the risk-free rate) 
and implied volatilities for the S&P 100 index, for the six years from November 
1983 to December 1989. Their GARCH estimates of o and ó for Wednesday to 
Wednesday returns are respectively 0.27 and 0.32, with robust t-ratios equal to 
1.17 and 3.00. However, Friday to Friday returns give t-ratios equal to 1.33 and 
0.96. Likelihood-ratio tests indicate that each of the constraints a = £ = 0 and 
ó — Oshould be rejected for the Wednesday data. Similar results are obtained from 
the EGARCH estimates. Day and Lewis conclude that both returns and implied 
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volatilities contain incremental information. They suggest that the differences 
between the results for the Wednesday and the Friday returns may be explained 
by expiration day effects. Lamoureux and Lastrapes (1993) study two years of 
daily data for ten US stocks. Their log-likelihoods are higher for GARCH(I, 1) 
models than for conditional variances that are functions of implieds alone. This 
indicates that the historical information was the more informative source, but it 
is difficult to draw conclusions from such short data series. 

Kroner, Kneafsey, and Claessens (1995) estimate the modified GARCH(I, 1) 
model, (15.27), for seven commodities during the four-year period from 1987 
to 1990. They find incremental information in both implied volatilities and the 
history of returns. 

Xu and Taylor (1995) also estimate the modified GARCH(1, 1) model, but with 
conditional generalized error distributions as defined in Section 9.6. Daily returns 
for four dollar exchange rates from 1985 to 1989 are used for the tests about 
information content. The implied volatilities come from spot FX options traded 
in Philadelphia. The null hypothesis that the options information is irrelevant is 
rejected at low significance levels by t-ratios and by likelihood-ratio test statistics. 
For example, if Lo is the maximum log-likelihood when 6 = 0 and L is the 
unconstrained maximum, then LR = 2(L; — Lo) should be compared with 3 
The LR values equal 36 for sterling, 45 for the Deutsche mark, 35 for the Swiss 
franc, and 10 for the yen (their Table 3). All these values exceed 6.63 and hence 
reject the null hypothesis at the 1% level. The respective estimates of 5 are 1.02, 
0.93, 0.82, and 0.39. In contrast, the null hypothesis of no incremental information 
in the returns can only be rejected for the yen. With Lo now defined as the 
maximum log-likelihood when o = 6 = 0, LR should be compared with Xs: The 
null hypothesis is rejected at the 5% level when LR exceeds 5.99. The test values 
are less than 1 for all three European currencies, while LR equals 10.5 for the yen. 
As the in-sample estimates of o are essentially zero for the European currencies, 
the Philadelphia options contained all the relevant in-sample information about 
one-day ahead conditional variances for these currencies. The same conclusion 
is obtained by Guo (1996), who uses the same GARCH specification and similar 
exchange rate data for two currencies. 


15.6.4 Forecasting Accuracy 


Out-of-sample comparisons of forecasts are included in all the studies of infor- 
mation content that have just been reviewed. These comparisons are of more 
practical relevance than the in-sample tests of hypotheses about information con- 
tent. Although itis easy to identify the most accurate out-of-sample forecast, it can 
be difficult to say if a forecast is significantly more accurate than some alternative. 
In the studies that we discuss, it is found that the in-sample and out-of-sample 
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methodologies support the same conclusions about the sources of incremental 
information. 

Day and Lewis (1992) evaluate R? for one-step ahead predictions of squared 
weekly returns, during the same period as their in-sample tests. The parameters in 
their ARCH forecasts are, however, obtained from earlier periods. All the values 
of R? from univariate regressions are low. Multivariate regressions do not succeed 
in identifying the source(s) of relevant information for making predictions. Lam- 
oureux and Lastrapes (1993) innovate by predicting the sum of squared returns 
during the lifetimes of options. The least mean squared errors for ten stocks, over 
a two-year period, come from sample averages, followed by GARCH forecasts, 
with implieds the least accurate. Encompassing regressions provide a different 
ranking of the information in the forecasts, but these regressions have a lookback 
bias and are for few independent volatility outcomes. 

Kroner et al. (1995) attempt the ambitious task of forecasting commodity price 
volatility over 225 calendar days. They assess several combinations of historical 
and implied forecasts and offer evidence that combinations provide the most 
accurate forecasts. 

Xu and Taylor (1995) use daily time series to predict the realized volatility of 
exchange rates over four-week periods. Forecasts are made for thirty nonover- 
lapping periods, from October 1989 to February 1992. Their option forecasts are 
given by either a term structure model for implieds or by the option whose expiry 
time is nearest to four weeks. Rolling samples of 250 weeks of daily data are 
used to estimate ARCH parameters. The univariate comparison of forecasts finds 
that implied volatilities have smaller mean square errors than either GARCH(I, 1) 
forecasts or the previous realized volatility, for all four currencies considered. The 
RMSE values for the £/$ forecasts are typical and equal 0.033 for options fore- 
casts, 0.036 for GARCH forecasts, and 0.041 for the previous realized volatility. 
No significant incremental information is found in the GARCH forecasts when 
bivariate regressions are estimated. Guo (1996) reports the same out-of-sample 
conclusion for similar data; implieds are efficient but biased predictors of sixty- 
day realized volatility and there is no incremental information in sixty-day moving 
averages and in GARCH forecasts. 


15.7 High-Frequency Forecasting Results 


Intraday prices provide more accurate measurements of realized volatility than 
daily prices. They also define more accurate volatility forecasts. Daily high and 
low prices suffice to obtain significant gains in forecast accuracy (Taylor 1987). 
In general, low-frequency predictors are outperformed by time series forecasts 
derived from sums of squared intraday returns (Andersen et al. [ABDL] 2003) 
and from sums of absolute intraday returns (Ghysels, Santa-Clara, and Valkanov 
2004b). 
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We now discuss some high-frequency studies that compare the accuracy of 
historical and implied predictors. All the studies measure realized volatility by 
sums of squared intraday returns. Most use time series of daily sums to predict 
sums over horizons that range from one day to six months. 


15.7.1 ARFIMA Forecasts versus Implied Predictors 


Historical forecasts of realized volatility have been obtained from ARFIMA mod- 
els, motivated by the evidence for long memory effects presented in Section 12.9. 
A typical model is defined for the logarithms of daily realized volatility ô? and 
includes a mean level u and autoregressive, fractional, and moving-average filters. 
A filtered sequence of i.i.d. variables e; defines the model: 


log(ó7) = w+ (L'A — L) "655. (15.30) 


Li (2002) compares out-of-sample ARFIMA forecasts with implied volatilities 
for three dollar exchange rates from 1995 to 1999. Forecasts are made of sums 
of squared five-minute returns over time intervals up to six months. The accuracy 
of the historical and implied forecasts is comparable for the one-month horizon, 
using the R° criterion. The values of R? for Deutsche mark, yen, and sterling 
series are respectively 47%, 51%, and 39% for the implied predictor, compared 
with 44%, 47%, and 42% for the historical predictor. The values of R? are much 
lower for the six-month horizon. They equal 12%, 14%, and 7% for the historical 
predictor, and only 246, 796, and 096 for the implied predictor. Martens and Zein 
(2004) note that the values of R? are inflated, because overlapping forecasts are 
used in the regressions. 

Pong, Shackleton, Taylor, and Xu (2004) also analyze forecasts for the same 
three exchange rates, and present out-of-sample comparisons for the period from 
1994 to 1998. They obtain ARMAQ, 1) and ARFIMA(1, d, 1) forecasts for real- 
ized volatility and compare them with GARCH forecasts and linear functions 
of at-the-money implied volatilities. They find that the AR(FI)MA forecasts are 
the most accurate for one-day and one-week forecast horizons; these forecasts 
then contain significant incremental information beyond the implied volatility 
information. For the longer horizons of one month and three months, the implied 
volatilities are at least as accurate as the historical forecasts and they incorporate 
almost all of the relevant information. The forecasting performances of the short 
(ARMA) and long (ARFIMA) memory specifications are very similar. 

Martens and Zein (2004) study futures markets for the S&P 500 index, the yen/$ 
rate, and crude oil, from the mid 1990s to the end of 2000. They use both floor and 
electronic trading records to measure volatility over 24-hour periods. They pro- 
vide strong evidence for incremental predictive information in the high-frequency 
returns, for all three assets. ARFIMA forecasts, implied volatility forecasts, and 
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the averages of the two forecasts are compared for the three assets over six hori- 
zons, which vary from one to forty days. The average has the lowest values of a 
mean square error criterion for fifteen of the eighteen comparisons. It also has the 
highest value of R? for fourteen comparisons. 


15.7.2 Historical versus Model-Free Option Forecasts 


Jiang and Tian (2004) innovate by investigating the in-sample information content 
of the model-free variance expectation obtained from all option exercise prices, 
based upon equation (14.44). They evaluate forecasts of S&P 500 realized volatil- 
ity, defined as the sum of squared five-minute returns, for the period from 1988 to 
1994. Their historical forecast for a selected horizon is simply the latest observa- 
tion of the realized volatility over the same horizon. Their regression results for 
nonoverlapping, thirty-day horizons provide strong evidence that the model-free 
forecast subsumes all information in the at-the-money implied volatility and the 
lagged realized volatility. When the forecasts and the target are specified as stan- 
dard deviation measures, the regression coefficients are 0.84 for the model-free 
forecast, —0.05 for the at-the-money implied, and —0.01 for the lagged target, 
with R? equal to 7446. Similar conclusions and estimates are obtained for forecast 
horizons up to 180 days. 


15.7.3 Forecasting Using ARCH Specifications 


The ARCH methodology of Section 15.6 has been applied to high-frequency 
returns in at least two studies. Taylor and Xu (1997) apply the methodology to 
a year of one-hour conditional variances for DM/$ returns. Their variances are 
calculated from five-minute returns, hourly returns, and daily implied volatilities 
vr; their specification for A+ „n modifies equation (12.12) by adding a term 045 v? 1 
for hour n on day t. The five-minute returns are found to contain more information 
than the other variables. For the most general model, there is incremental in- 
sample information in both the five-minute returns and the implied volatilities, 
but none in the hourly returns. Forecasts of the realized DM/$ volatility during the 
next hour are evaluated. These forecasts are compared over a three-month period, 
with all parameters estimated from data for the previous nine months. Accuracy 
is measured by both the mean absolute error and the mean squared error. The 
forecasts that use both five-minute returns and options information are found to 
be more accurate than the forecasts that only make use of one of these two sources 
of information. 

To conclude our review of empirical evidence, we now present a detailed sum- 
mary of the study of S&P 100 index volatility by Blair, Poon, and Taylor (2001b). 
They supplement the GJR(1, 1) model for daily returns r; by including (i) scaled 
implied volatilities v;, (11) dummy variables d; that are one for the crash day of 19 


October 1987 and are otherwise zero, and (iii) intraday realized volatilities OF. 
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Table 15.1. Parameter estimates for an ARCH model 
augmented by realized and implied volatilities. 


(Results for the S&P 100 index, obtained using daily time series from January 1987 to 
December 1992. The conditional variance of the GJR(1, 1) model, augmented to include 
daily realized and implied volatilities, is defined by equation (15.31). The conditional mean 
is a constant plus a dummy variable on the crash day of 19 October 1987. The parameters are 
estimated by maximizing the quasi-log-likelihood function, defined by conditional normal 
densities. Robust t-ratios are shown in parentheses. The excess log-likelihoods are relative 
to model 1.) 


Model 
Parameter 1 2 3 4 5 6 7 
c x 106 2.6891 44.797 0.7208 9.6881 12.5372 63594 0.3110 
(1.53) (3.85) (1.48) (0.74) (0.72) (0.55) (0.70) 
o 0.0136 0.0085 0.0741 0.0029 
(1.53) (1.56) (1.34) (0.47) 
o" 0.0280 —0.0078 —0.0485 —0.0029 
(1.19) (—0.79) (—1.20) (—0.47) 
B 0.9417 0.9793 0.9773 0.5209 —0.3039 0.5954 0.9695 
(33.67) (118.7) . (130.8) (1.53  (—1.93) (3.58) (95.89) 
y x 10? 1.563 0.590 0.432 2.104 0.800 1.618 0.259 
(2.13) (1.67) (1.26) (1.40) (0.81) (0.68) (0.67) 
y 0.6396 0.5661 0.3718 0.3742 
(3.64) (3.32) (1.86) (1.86) 
ÉRV 0.2523 0.2350 0.0360 0.0539 
(1.51) (1.53) (0.23) (0.39) 
D 0.4343 0.4101 0.3283 02816 
(6.31) (6.06) (4.47) (3.27) 
bv 0.1509 0.1553 0.1943 0.1778 
(2.73) (2.59) (3.40) (3.49) 
log L 4833.66 4845.80 4848.35 4851.97 4858.48 4859.70 4860.79 
Excess log L 12.14 14.69 18.31 24.82 26.04 27.13 


Tables 15.1—15.3 are reprinted from a paper in the Journal of Econometrics, volume 105, B. J. Blair, 
S.-H. Poon, and S. J. Taylor, Forecasting S&P 100 volatility: the incremental information content of 
implied volatilities and high frequency index returns, pp. 5-26, Copyright © (2001), with permission 
from Elsevier. 


Their equation for the conditional variance is 


o + (à io Sii) HIE + dia vor 
1— BL 1— Bav L 
with S;_; = 1 when r;—1 < m, otherwise A, 1 = 0. 

The in-sample information content of daily returns, realized volatilities, and 
daily implied volatilities is assessed from six years of daily conditional variances, 
from 1987 to 1992. Five-minute and overnight returns are used to calculate G7, 
while the VIX index of implied volatilities defines v;. VIX is defined near the end 
of Section 14.4 and is intended to contain minimal measurement error. Table 15.1 


àv? 
E. (15.31) 


hi = 
1- 6L 
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Figure 15.1. Forecasts of index volatility. 


shows the parameter estimates, robust f-ratios, and maximum log-likelihoods for 
seven models when conditional normal distributions are assumed. Models 1—6 are 
special cases of the general model 7 defined by (15.31). The columns of Table 15.1 
are arranged in the order of ascending likelihood. 

The second model only uses the information in realized volatilities and it has 
a substantially higher log-likelihood than the first model, which only uses daily 
returns. Combining the historical information sources into the third model leads 
to acceptance of the null hypothesis (at the 5% level) that all the relevant historical 
information is in the realized volatilities. The filter 1/(1 — Sry L) is applied to 
the intraday variable. As the estimate of Bry is only 0.25 for the second model, 
most of the information in intraday returns about future volatility comes from the 
most recent day. 

The fourth model only uses the VIX information. As the log-likelihood is 6.17 
more than for the second model, which has the same number of parameters, 
VIX was more informative than the intraday returns. The estimate of the param- 
eter £, in the filter 1/(1 — vL) is only 0.15, but it has a robust t-ratio equal 
to 2.73. Models 5-7 are compared with model 4 to decide if VIX contains all 
the relevant information about the next day's volatility. The key comparison is 
between model 4 and model 6, which uses both VIX and intraday returns. There 
are two additional parameters in model 6 and its log-likelihood is 7.73 more than 
for model 4. The likelihood-ratio test conclusively favors model 6, but it is not 
robust when the conditional distributions are not normal. The robust f-ratio for 
the intraday parameter y equals 1.86 and the one-tail p-value for the test is 3%. 
It is concluded that there was probably some incremental in-sample information 
in the intraday returns. Note that comparing y with ô does not give the correct 
interpretation of the relative importance of OF and "a when calculating h;. As 
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Table 15.2. The relative accuracy of S&P 100 index volatility forecasts from January 1993 
to December 1999. Values of P for forecasts of sums of (a) squared excess returns, and 
(b) squared five-minute and overnight returns. 


(The tabulated numbers are proportions, P, of the variability of two realized volatility mea- 
sures explained by four forecasts. The proportion P is defined by equation (15.9). It is a 
linear function of the mean square forecast error. HV is a simple historic volatility forecast. 
The GJR, INTRA, and VIX forecasts are obtained from special cases of the ARCH equa- 
tion (15.31). Forecasts are made once a day. The values of P are calculated from 1769 — N 
forecasts.) 


Forecast N—1 N=5 N=10 N=20 


(a HV 0.007 0.089 0.112 0.128 
GIR 0.106 0.085 0.016 0.013 
INTRA 0.099 0204 0214 ` 0250 
VIX 0.115 0.239 0.297 0.348 

(b HV 0.167 0.243 0.255 0255 
GIR 0.3755 0289 0.169 0.181 
INTRA 0.383 0.494 0.4455 0.465 
VIX 0.401 0.533 0.534 0.545 


the average level of us is 2.6 times that of Ou for reasons explained by Blair 
et al., the calculation of h; for model 6 is dominated by the information in VIX. 

Blair et al. evaluate out-of-sample forecasts of the volatility of the S&P 100 
index from 1993 to 1999. They predict realized volatility over one, five, ten, 
and twenty trading days and they calculate it in two ways, firstly from daily 
returns and secondly from five-minute and overnight returns. The forecasts are 
obtained from historical volatility (HV), based upon the variance of the previous 
100 daily returns, and from three special cases of the ARCH model (15.31). Each 
special case uses only one source of information, thus the GJR forecasts use 
r;, the INTRA forecasts use 87. and the VIX forecasts use v;. The parameters 
of (15.31) are estimated from rolling samples of length 1000 days and they are 
used to construct one-day-ahead forecasts h; of the next squared daily excess 
return. Forecasts further ahead and for the intraday realized volatility are given 
by appropriate linear combinations of h;. Figure 15.1 shows the volatility forecasts 
obtained from the VIX series, stated as annualized standard deviations. 

Table 15.2 shows the proportions P of the variability of realized volatility 
that are explained by each of the four forecasts; P is a linear function of the 
mean square forecast error and is defined by (15.9). The VIX forecasts have 
the highest value of P, for both measures of realized volatility and for all four 
forecast horizons. The INTRA forecasts are the most accurate of those that use 
index returns, as should be expected. The values of P are much higher when 
intraday returns are used to calculate the realized volatility target, consistent with 
the predictions of Andersen and Bollerslev (1998b). The effect is seen most clearly 
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Table 15.3.  Correlations and multiple correlations for S&P 100 index volatility forecasts 
from January 1993 to December 1999. Values of R? for forecasts of sums of (a) squared 
excess returns, and (b) squared five-minute and overnight returns. 


(The tabulated numbers are the squared correlation R? from regressions of realized volatility 
on one or more forecasts. HV is a simple historic volatility forecast. The GJR, INTRA, and 
VIX forecasts are obtained from special cases of the ARCH equation (15.31). Forecasts are 
made once a day. The values of R^ are calculated from 1769 — N forecasts.) 


Explanatory variables N=1 N=5 N=10 N=20 


(a) HV 0.043 0.111 0.151 0.197 
GJR 0.118 0.181 0.189 0.223 
INTRA 0.099 0.212 0.238 0285 
GJR and INTRA 0.119 0217 0.240 0.287 
VIX 0.129 0.249 0.304 0.356 
VIX and INTRA 0.129 0.250 0.304 0.356 
VIX and GJR 0.139 0.253 0.304 0.356 
VIX, GJR, and INTRA 0.144 0.253 0.304 0.356 

(D HV 0.185 — 0.282 0.309 0.335 
GJR 0.423 0.449 0.395 0403 
INTRA 0.385 0.506 0.482 0.499 
GJR and INTRA 0.4443 0.525 0.490 0.504 
VIX 0.445 0.567 0.559 0.569 
VIX and INTRA 0.448 0.575 | 0.560 0.576 
VIX and GJR 0.491 0.586 0.564 0.576 


VIX, GJR, and INTRA 0.495 0.586 0.565 0.577 


for one-day-ahead forecasts, when P is approximately multiplied by four if the 
target is intraday realized volatility rather than the squared excess return. 

Table 15.3 shows selected values of R? when the realized volatility measures 
are regressed against one or more of the four forecasts. For univariate regressions, 
VIX again has the highest values. Multivariate regressions that employ VIX only 
increase R? by small amounts that are negligible for the longer forecast horizons. 


15.8 Concluding Remarks 


The empirical evidence shows that the most accurate forecasts of volatility are 
often provided by functions of implied volatilities obtained from option prices. 
Implieds nearly always contain useful information. They usually need to be scaled 
to remove bias. This may arise because the Black-Scholes formula ignores the 
stochastic behavior of volatility and any volatility risk premium. Historical infor- 
mation about daily asset prices is also informative, particularly for short-term 
predictions up to a week ahead, but much (if not all) of this information is incorpo- 
rated into option prices. Intraday asset prices contain further volatility information 
that is not reflected by option prices at some markets. 
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In assessing the evidence it is notable that genuine out-of-sample evaluations of 
forecasts are relatively rare. In-sample optimization of parameters, either explic- 
itly or by the use of correlations to assess accuracy, unfortunately hinders the 
interpretation of many forecast comparisons. 
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Density Prediction for Asset Prices 


Probability densities for future asset prices can often be obtained from previous 
asset prices and/or the prices of options. This chapter describes many of the 
methods that have been proposed and provides numerical examples of one-month- 
ahead predictive densities. 


16.1 Introduction 


A volatility forecast is a number that provides some information about the distri- 
bution of an asset price in the future. A far more challenging forecasting problem 
is to use market information to produce a predictive density for the future asset 
price. A realistic density will have a shape that is more general than provided by 
the lognormal family. In particular, a satisfactory density forecasting method will 
not constrain the levels of skewness and kurtosis for the logarithm of the predicted 
price. 

It is quite easy to obtain a predictive density by using a history of asset prices 
to estimate and simulate an ARCH model. The density is then called a real-world 
density, as it reflects the dynamics of real prices. The letter P is used to indicate 
that a density applies to real prices. Predictive densities can also be obtained from 
a set of option prices, based upon a theoretical result for complete markets derived 
by Breeden and Litzenberger (1978). Many empirical methods estimate the risk- 
neutral density for the asset price at the time when the options expire. The letter 
Q is then used. One major distinction between a P-density and a Q-density is that 
the expectation of the former reflects the asset's risk while the expectation of the 
latter does not. There are other distinctions; for example, risk-neutral densities 
for equity indices are more negatively skewed than real-world densities. 

Most of this chapter is about methods for estimating densities from option 
prices, covering first Q-densities and then transformations that provide P-dens- 
ities. These estimation methods are reviewed by Jackwerth (1999). They deserve 
attention because option prices may be anticipated to be more informative than 
the history of asset prices, following our discussion of volatility forecasts in the 
previous chapter. 
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Density estimates have many applications. They can be used to assess mar- 
ket beliefs about political and economic events, to manage risk, to price exotic 
derivatives, to estimate risk preferences, and to evaluate the rationality of market 
prices. 

Section 16.2 describes and illustrates estimation of a real-world density using 
a history of asset prices alone. Sections 16.3 and 16.4 then cover risk-neutral 
density (RND) concepts and estimation in general terms. They are followed by 
a description of several parametric methods in Sections 16.5 and 16.6 and by 
nonparametric methods in Section 16.7. Some advice about selecting from among 
the many RND methods is offered in Section 16.8. 

Two types of transformations from Q- to P-densities are described in Sec- 
tion 16.9; one is based on stochastic discount factors and a representative agent 
model, while the other uses a recalibration function. Both transformations include 
parameters that can be estimated from a set of density predictions and the actual 
values of the prices that are predicted. The usefulness of these methods is related 
to the rationality of the inputs provided by option prices, which is discussed in 
Section 16.11. 

Numerical examples are provided throughout the chapter for one-month-ahead 
prediction of the FTSE 100 index. An Excel spreadsheet is described in Sec- 
tion 16.10 for a method that is easy to implement, based upon fitting a curve to 
the implied volatility “smile.” Prediction of the probabilities of extreme events is 
particularly difficult and some guidance is offered in Section 16.12. 


16.2 Simulated Real-World Densities 


A time-series model for prices, together with a price history, can be used to find 
a real-world density function for a later price by simulating the model. Different 
models and/or different histories will define different densities. ARCH models 
are ideal for simulations, because there is only one random term per unit time. 
These models are discussed here. A feasible but more complicated alternative is 
simulation of one of the stochastic volatility (SV) models defined in Chapter 11. 
General SV models are defined using two random terms per unit time, with the 
additional complication that the two stochastic processes may not be independent 
of each other. 


16.2.1 ARCH Methodology 


Simulation methods that provide option prices are discussed in Section 14.7. 
Similar methods are applicable when estimating real-world densities. Suppose 
that ¢ counts trading days and that there is a history of m observed daily changes 
in price logarithms, Im = (rji, 1 < t < m}, with r; = log(p;) — log(pi-1). 
Any dividends are excluded from the "returns" r; as our intention is to simulate 
prices and not a measure of total wealth that incorporates dividend payments. 
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The history Im is to be used to find the density of the price Pm+n after another 
n days of trading. The current and later prices are also denoted by S = pm and 
Sr = Pm+n in this chapter, with T measuring the forecast horizon in years. 

We suppose that an ARCH model for prices is estimated from the history Ij, 
and that this model is also applicable into the future. For the general structure 
outlined in Sections 9.5 and 9.6, 


Ti = Hr Less ut hl, (16.1) 


with ur the conditional mean and h; the conditional variance; these conditional 
moment functions are determined by the information /;_; and a parameter vector 
0. The first m standardized residuals z; are assumed to be independent observations 
from a common distribution. This distribution has zero mean, unit variance and 
is denoted by D(O, 1). The final n standardized residuals are random variables. 

One simulation involves giving values to Zm+41,..-, Zm+n- These can be ob- 
tained either by making independent random draws from D(0, 1) or by the boot- 
strap method that makes independent random selections from the set (z1, ..., Zm}. 
The first method is used here, while Rosenberg and Engle (2002) use the boot- 
strap method to construct densities for the S&P 500 index. The history Zm provides 
Um+1 and hm+1. From a simulated Zu 11 we obtain a simulated value for rat) 
using (16.1). This value is added to /,, to define Ju 1 and hence 14,4» and hm+2. 
Then 7,42 follows from the value of zm+2 and (16.1) again. Repeating this pro- 
cess, one set of simulated values z,41,...,Zm+n defines one simulated price, 
DPm+n = PmeXP(m+41  rma2 cc + m+n). This numerical method can be 
repeated as often as desired. A very large number of simulations are required 
to obtain an accurate estimate of the density; 200 000 replications are used by 
Rosenberg and Engle (2002). 


16.2.2 An Example 


We illustrate density estimation methods for the FTSE 100 index on one date 
throughout this chapter. We suppose that on Friday, 18 February 2000, we are 
interested in finding a density for the index four weeks later, when March futures 
and options conclude trading at 10:30 local time on 17 March. There are no 
holidays during these four weeks and thus n — 20. The index level was 6165 
when the market closed on 18 February 2000. Figure 16.1 shows the index levels 
during the three months up to the day on which density estimates are sought. 

Ten years of daily index levels are used to estimate an asymmetric volatility 
model. The simplest model of Glosten et al. (1993), defined in Section 9.7, is 
extended to the GJR(1, 1)-MA(1)-M specification, whose calculations are illus- 
trated in Section 9.8. This specification has 


Ja =u AIT + Oei (162) 
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Figure 16.1. FTSE 100 history for three months until 18 February 2000. 


and 

h;— o4 ae? , + a S;1e7 4 + Bhy_ (16.3) 
with 

RON 1 eed < 0, 
0 ife_ ; > 0. 

The ARCH-M parameter À is insignificant for this dataset; consequently it is 
set to zero. The distribution of the standardized residuals is assumed to be the 
standardized f-distribution with v degrees of freedom, for which the maximum 
log-likelihood is 21.2 more than for conditional normal distributions. The maxi- 
mum likelihood estimates of the parameters are u = 3.39 x 107^, © = 0.052, 
w = 5.14 x 1077, æ = 0.0112, o" = 0.0497, B = 0.9583, and v = 12.8. These 
values can be compared with the S&P 100 estimates in Table 9.3, for a similar 
ten-year period. It is notable that the persistence parameter of the FTSE data is 
close to one, as a + 0.5a7 + B = 0.9944. The magnitude of the asymmetric 
volatility effect is summarized by the ratio A = (o + a~)/a, which is discussed 
in Section 10.2. The estimated ratio is high at 5.4, indicating a substantial level 
of asymmetry, which creates negative skewness in multi-period returns. The first 
one-day conditional variance, /,,, ;, equals 1.86 x 1074, which is equivalent to an 
annualized conditional standard deviation equal to 2296. This high level was only 
exceeded on 5% of the days in the ten-year estimation period. Thus the illustrative 
densities in this chapter have relatively high levels of dispersion. 

A total of N = 100000 prices are simulated for the index level on 17 March. 
The parameter values are those already stated, except that v is increased from 12.8 
to 13. Half of the simulated prices are given by an antithetic variable method; if 
one price is given by Zm41,.--,Zm-+n, then the other follows from the values 


16.2. Simulated Real-World Densities 427 


0.0012 4 
—— ARCH 
SSC Lognormal 
0.0010 4 
0.0008 4 
D 
& 0.0006 | 
A 
0.0004 4 
0.0002 4 
4000 4500 5000 5500 6000 6500 7000 7500 8000 
Index level on 17 March 2000 
Figure 16.2. A real-world density from an ARCH model. 
—Zm+1>-+-»—Zm+n- The expected value on 17 March from the simulations is 


6217, representing an expected rise of 0.84% over four weeks equivalent to 11% 
per annum. The estimated standard deviation of the change in the index level is 
ô = 389 during this period of time. The actual value of the index on 17 March 
was 6558, which is 0.88 standard deviations higher than expected. Approximately 
81.5% of the simulated index levels are below the actual outcome. 

The N simulated values of pm+n, denoted by p®, are smoothed to provide the 
kernel estimate 


La x — p? 
Fr) = al S ) (16.4) 


with $ (-) the standard normal density. The bandwidth b is set equal to 40, which 
approximately equals 6/N°. Figure 16.2 shows both the above density and 
the lognormal density that matches the mean and the variance of the simulated 
log prices. The simulated distribution of log(ST) is slightly skewed to the left 
and has a small amount of excess kurtosis. The skewness equals —0.25 and the 
negative value occurs because the asymmetry parameter o is positive. The solid 
curve in Figure 16.2 represents the kernel estimate that can be compared with the 
dotted curve for the lognormal density; the kernel estimate is higher in the left 
tail because the ARCH density is skewed. Table 16.1 shows the mean, standard 
deviation, skewness, and kurtosis for the ARCH simulations and the matching 
lognormal density. 
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Table 16.1. Moments for a selection of real-world density estimation methods. 


(Results are given for the density of the FTSE 100 index on 17 March 2000, estimated from 
index levels and option prices known four weeks earlier. The ARCH specification is defined 
in Section 16.2. The generalized beta distribution is defined in Section 16.5. The three GB2 
columns are for the risk-neutral density estimated using option prices and two real-world 
densities defined in Section 16.9, which are motivated by utility and calibration theory.) 


ARCH 
gà GB2 
Lognormal | —————— ——— 


Density match RND Utility ` Calib. 


Type P P Q P P 

Index statistics 

Mean 6217 6217 6229 6295 6303 

Standard deviation 389 391 463 434 394 

Skewness —0.04 0.19 0.80 0.71 0.67 

Kurtosis 3.23 3.06 4.37 4.26 4.16 
Log(index) statistics 

Mean 8.733 8.733 8.734 8.745 8.747 

Standard deviation 0.0629 0.0629 0.0774 0.0714 0.0644 

Skewness —0.25 0 1.16 1.04 0.96 

Kurtosis 3.39 3 5.82 5.45 5.14 


16.3 Risk-Neutral Density Concepts and Definitions 
16.3.1 Preliminary Remarks 


The theoretical price of a European option is often written as the discounted 
expectation of the final payoff. This is valid when an appropriate probability dis- 
tribution for the final price of the underlying asset is used. One textbook example 
is the binomial set-up where the asset price is now S and will be either Sr = uS 
or $r — dS when the option expires. The theoretical price of a call option with 
exercise price X is then given by a no-arbitrage argument as 


c(X) = e"T[p max(uS — X,0) + (1 — p) max(dS — X,0)], (16.5) 


where p is a risk-neutral probability that prevents arbitrage profits and r is the 
risk-free rate (Hull 2000, Chapter 9). The probability p does not equal the real- 
world chance of the outcome Ar = uS when investors demand a risk premium 
for holding the asset. A second textbook example occurs when prices follow 
geometric Brownian motion and option prices are given by the Black-Scholes 
formula. There is then a lognormal risk-neutral density for Sr, say w(x), for 
which 


c(X) = eT [^ — X)y (x) dx. (16.6) 
X 
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See, for example, Section 14.3 or Hull (2000, Appendix 11A). This risk-neutral 
density is not the real-world density of Sr when investors are risk averse. 

Theoreticians develop and traders apply pricing formulae that are more com- 
plicated than the above examples. The concept of a risk-neutral density (RND) 
then continues to be applicable. Our interest is in using observed market prices for 
options to infer an implied risk-neutral density. Once we have an implied RND 
we can hope that a simple transformation will give us a useful real-world density. 
This section continues with notation, definitions, and some key theoretical results. 
Itis followed in Section 16.4 by general principles for finding implied RNDs and 
then by concrete examples of methods and results in Sections 16.5—16.7. The 
choice of a best method is discussed in Section 16.8. Transformations that pro- 
vide real-world densities are covered in Section 16.9, leading to Excel examples 
in Section 16.10. 

Bliss and Panigirtzoglou (2002) list several studies that use implied RNDs to 
evaluate market expectations concerning economic and political events, as well as 
asset prices (e.g. Malz 1996; Coutant, Jondeau, and Rockinger 2001; Gemmill and 
Saflekos 2000). Central banks, in particular, are extremely interested in market 
perceptions of price distributions (e.g. Sóderlind and Svensson 1997), although 
much of their research has not been published. 


16.3.2 Notation and Assumptions 


We follow the notation of Chapter 14. The price of an underlying asset now is S 
and options expire after T years when the asset price is Sr. Prices are assumed 
to have continuous distributions. The risk-free rate is constant and equals r and 
the asset pays dividends at a constant rate q, as discussed in Section 14.2. The 
forward price now to buy the asset at time 7, which excludes arbitrage profits, is 


F = Self Of. (16.7) 


This is also referred to as the futures price and it is a relevant theoretical quantity 
even if there is no trade in forward or futures contracts. When the asset is a futures 
contract, q = r and S = F. 

Only European options are considered in the theoretical analysis. We only 
discuss call options because the prices of calls and puts are connected by the 
parity equation (14.2). The exercise price of a general option is X and the call 
price is then denoted by c(X); it is implicit that c also depends on other variables, 
such as A and T. Any value X > 0 is permitted, regardless of the finite number 
of exercise prices that are traded at real markets. 

The functions $ (-) and N (-) continue to respectively represent the density func- 
tion and the cumulative distribution function of the standard normal distribution. 
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16.3.3 A Definition of the RND 


The letter Q indicates that expectations and probabilities are those that apply in a 
risk-neutral context. A theoretical risk-neutral density fg for Sr is defined here 
as the density for which theoretical European option prices are the discounted 
expectations of final payoffs; thus, 


c(X) 2 eT E21 (Sp — X)*] (16.8) 


=e! IR max(x — X, 0) fo (x) dx 
0 


=e7T fo — X) fo(x) dx, (16.9) 
X 


for a complete set of exercise prices, i.e. for all X > 0. 
RNDs are defined for all x > 0. Of course fọ(x) > 0 and 


Lu fo(x) dx = 1, 
0 


although some empirical estimates violate one or both of these constraints! A call 
option that has exercise price zero is almost identical to a forward contract, except 
the former requires payment now while the latter involves settlement at time T. 
A payment of either c(0) now or F — e'T c(0) at time T will obtain the asset at 
time 7. Thus we deduce an important constraint on the RND: 


F = Esch [^ xfolxae. (16.10) 
0 


Any European contingent claim whose payoff at time T is solely a function of Sr 
can be valued using the RND, which provides further motivation for empirical 
work. The fair price, to be paid now, for the payoff g(Sr) is 


eT Effet ës = e" f g(x) fo(x) dx. (16.11) 
0 


The existence and uniqueness of the RND follows from an equation of Breeden 
and Litzenberger (1978), assuming c(X) has been defined for all X > O and 
hence the market is complete. Any RND then gives the following results, by 
differentiating (16.9): 

dc —rT S 

ax ^ —e f fo(x) dx (16.12) 
and x 

= =e-'T fo(X). (16.13) 

Thus if any RND exists it must be unique. To demonstrate its existence, begin 
with (16.13) and substitute this expression into the integral on the right-hand side 
of (16.9). Providing c(X) satisfies weak conditions that prevent arbitrage profits, 
fo is a density function and the integral simplifies to e'T c(X) as required. 
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16.3.4 Lognormal Example 


We have already discussed risk-neutral pricing for the Black-Scholes framework 
in Section 14.3, and we now summarize the main results. When real-world prices 
follow a geometric Brownian motion process, 


dS/S = udt + o dW, 
and real-world probabilities are obtained from the measure denoted by P, then 
Së _ 1.27 Q2 
log(Sr) ^ N(log(S) + uT — 50°T, 0° T). (16.14) 
Replacing u by r — q, and using (16.7), gives the risk-neutral distribution 
g _ 1,27 „2 
log(Sr) ~ N(log(F) — 50^ T, o^T) (16.15) 


and hence the lognormal RND: 


v(x|F,oT)- 


1 l 1 | log(x) — [log(F) — 50°7T] 8 
ex 
xo 2xT p 2 oVT 
1 
= ———¢(d(x)). (16.16) 
xo T 
Here d2(x) is a familiar term from the Black-Scholes formula, (14.8). We use 
the above parametrization of the lognormal density in this chapter. Inserting the 
density into (16.9) leads to the Black-Scholes formula, 


cps(S, T, X, r, q, 0) = cgs(F, T, X, r, r, o0) 


oo 
= ET (x — X)v (x | F, 0, T) dx. (16.17) 
X 
This conclusion can be checked by using (14.18). 


16.4 Estimation of Implied Risk-Neutral Densities 


An implied volatility provides information about the future dispersion of the asset 
price from one observed option price. An implied risk-neutral density is a far more 
ambitious object—it provides information about the entire distribution of a later 
asset price from several observed option prices. Theory provides few insights 
into an appropriate specification for the RND fo(x). Many types of density 
functions provide reasonable fits to observed option prices, so there is plenty of 
scope for individual preferences. These are apparent in a variety of methods, that 
are surveyed in Jackwerth (1999) and, to a lesser degree, in Bahra (1997), Cont 
(1999), Jondeau and Rockinger (2000), and Bliss and Panigirtzoglou (2002). We 
first describe the estimation problem and then introduce some illustrative data. 
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16.4.1 Three Equivalent Problems 
The following three problems are essentially identical. 
(i) Specify the RND f(x) for all x > 0. 
(ii) Specify call prices c(X) for all X > 0. 
(iii) Specify implied volatilities o;mpriea X) for all X > 0. 


To see this, note that fg gives c from (16.9), while c gives fọ from (16.13). 
Also, any price c within the rational bounds ((14.3) and (14.4)) defines an implied 
volatility (and vice versa) by solving 


c(X) = cgs (S, T, X, pr q, OimplieaX)). (16.18) 


These are equivalent problems providing it is impossible to make arbitrage profits; 
for example, oimpiiea (X) must not be a function that has ac / 9x? < 0 and hence 
fo(x) « 0 for some values of x. 


Data Issues 


Implied RNDs are extracted from a dataset of N contemporaneous European call 
prices, all of which expire after T years. Call i has exercise price Kr, option 
price cs (X;), and implied volatility om,impliea(X;). The additional subscript “m” 
is employed in this notation to emphasize that the values are given by market 
prices. 

The original data may be rather different. American option prices can be 
converted to approximately equivalent European prices by inserting American 
implied volatilities into a Black-Scholes formula; approximate American im- 
plieds can be obtained from the formulae of Barone-Adesi and Whaley (1987). 
Any available European put prices should be converted to call prices, using the 
put-call parity equation (14.2). 

After making as many of the above conversions as necessary, we may now have 
pairs of option and asset prices, ch (X;) and S;, for similar but varying times T; until 
expiry. Approximately contemporaneous prices cq (X;) are given by inserting the 
implieds for the noncontemporaneous data into the Black-Scholes call formula 
for suitable fixed values of S and T. If we assume that the implied volatility is 
only a function of the exercise price divided by the underlying asset price, then 
Xj and cet Eil can be defined by 


$ Sj 
cQ) = eas (Si, Ti, X], r, q, of), 
€m(Xi) = cps(S, T, Xi, r, q, eil (16.19) 
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Figure 16.3. Implied volatilities for FTSE options on 18 February 2000. 


Some exercise prices may occur more than once in a dataset. It may be appro- 
priate to retain repeated values. Alternatively, they can be eliminated by only 
keeping the observations that are closest to a particular time. The standard rule 
for any choices between calls and puts that have the same exercise prices is to 
prefer out-of-the-money options. Outliers occur in some option datasets, that can 
be detected by checking for violations of boundary conditions and for implied 
volatilities that are incompatible with the other observations. 


Illustrative Data 


Densities are estimated on 18 February 2000 for the level of the FTSE 100 index 
when the March 2000 options expire at 10:30 on 17 March. Only the prices 
of European options are used here because they were traded more often than 
American options. These options can be valued as options on March futures 
because the options expire when the futures are finally settled. Each option price 
can be matched with an almost contemporaneous futures transaction. 

The European option price data provided by LIFFE contains 80 March trades 
on 18 February and 96 matched pairs of bid and ask quotations. After deleting 
2 trades and 2 quote pairs that are obviously misrecorded, the data have 26 different 
exercise prices for trades and 30 for quotations. There is less noise in the mid-quote 
implied volatilities, when they are plotted against the exercise prices. This is a good 
reason to prefer the mid-quote option prices and they also have a slightly wider 
range of exercise prices. Only one trade observation is retained, corresponding to 
an exercise price that has no quotes. The 31 exercise prices range from 4975 to 
7025 with steps of size 50 between many of them. When an exercise price X; had 
more than one quote during the day we only retain the quote nearest to 12:00. 
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Equation (16.19) was then used to define contemporaneous option prices at 
12:00 on 18 February, when the March futures price was F = 6229, after replac- 
ing S by F and q by r. The adjusted exercise prices range from 4966 to 7013. 
Figure 16.3 displays the implied volatilities, plotted against X / F. The dotted and 
solid lines are respectively the linear and quadratic functions provided by least 
squares estimates. It can be seen that the implieds vary considerably, decreasing 
from 40% for deep out-of-the-money puts to 26% for at-the-money options and 
to 20% for some out-of-the-money calls. The implied volatility function is almost 
linear over 0.80 < X/F x 1.05 but decreases less rapidly to the right of this 
range. 


16.4.2 Estimation 


The estimation task is to find an appropriate RND f(x) whose pricing formula 
c(X) gives an acceptable approximation to observed market prices, thus 


oo 
€m (Xi) = c(Xi) = gp (x — Xi) fo(x)dx, L<i<N. (16.20) 
Equivalently, the density fo (x) should correspond to an implied volatility func- 
tion cimpliea (X) that has 


Om,implied(Xi) = Oimplied(Xi),  1&i «€ N. (16.21) 


Assuming the X; are sorted from low to high values, we may be able to obtain 
an implied RND that fits well throughout the range from X, to Xy. However, 
all estimation methods implicitly use extrapolation to estimate fọ (x) in the tail 
regions, x < X; andx > Xy. Ideally, the estimated risk-neutral probability of the 
outcome X, < Sr < Xy willalmost equal one. All methods also use interpolation 
between pairs of exercise prices, but this rarely leads to unreasonable estimates 
between X, and Xy. 

The RND is often a parametric function fo(x | 6) of M parameters, 6 = 
(01, ..., 0m). It is then common to estimate the parameters by minimizing a sum 
of squared errors. One criterion is 


N 
GO) = 3 (mX) — c(X; | ^, 
i=l 
with SS 
c(X; |0) S eT f (x — Xi) fo(x | 0) dx. (16.22) 
Xi 


Another plausible possibility is 


N 
G(0) = Y "(Gm implied (Xi) — dimpliea (X; | 8))?. (16.23) 


i=l 
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These criteria must be modified when the number of parameters is large relative 
to the number of observations, particularly when M 2 N. This can be done by 
adding a penalty function to G that is higher when the density is less smooth. 
Jackwerth and Rubinstein (1996) use a penalty function similar to 


oo 2; 2 
af (“es”) dx (16.24) 
0 


əx? 


for some positive constant A. General weighted least squares criteria and penalty 
functions are discussed by Bliss and Panigirtzoglou (2002). 


16.5 Parametric Risk-Neutral Densities 


The discussion of RND specifications is separated into three parts. Parametric 
specifications of the RND and the implied volatility function are respectively 
covered in this section and the next section. Nonparametric specifications are 
then reviewed in Section 16.7. 

The lognormal density function v (x | F, o, T) is an example of a parametric 
RND. As F is the market's forward price for a specific time T, the only free 
parameter is o. One free parameter cannot, however, generate densities that are 
sufficiently flexible to explain observed option prices. We cover four paramet- 
ric density specifications for the price Sr, all of which provide “closed-form” 
option pricing formulae. These specifications have between three and five free 
parameters. We also note some of their advantages and disadvantages. 


16.5.1 Lognormal Mixtures 


A mixture of two lognormal densities is probably the most popular parametric 
RND specification and it was first proposed by Ritchey (1990). The mixture 
density is a weighted combination of lognormal densities, 


fo(x) = pw | Fi,o1,T) + — p)w@ | F2, 02, T). (16.25) 
This is a density function if 0 < p < 1 and itis an RND if 
F = pFi + (1 =p) Fs. 
The standard deviation, skewness and kurtosis of Sr can be derived from 
E[S7] = pF} exp(4(n? — n)ofT) + (1 — p) F} exp(3 (n? — n)o2T). (16.26) 


There are five parameters in the vector 0 = (F1, F5, 01, 02, p). The risk-neutral- 
ity constraint reduces the number of free parameters to four, which is sufficient 
to obtain a variety of flexible shapes. Figures 16.4 and 16.5 show examples for 
one-month densities when F = 6000, F2 = F| + 600, and o4 = 209. On the first 
of these figures, p = 0.25 and o, = 0.2, 0.3, or 0.4. On the second, p = 0.1, 
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Figure 16.4. Lognormal mixtures, different volatility levels. 
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Figure 16.5. Lognormal mixtures, different probabilities p. 


0.25, or 0.5 and o; = 0.2. These illustrative densities are skewed to the left, as 
occurs in many empirical examples; some are also bimodal. 

A mixture distribution is particularly appropriate for asset prices when the 
density of Sr depends on one of two future states that will be determined before 
time T. For example, p might be the probability that a government is re-elected, 
with v (x | F1, o1, T) the density conditional on this event and y (x | Fo, 02, T) 
the density conditional on the event not occurring (Gemmill and Saflekos 2000). 
However, mixture densities are generally used as RNDs when there is no obvious 
motivation from a set of future states. 
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Figure 16.6. Three risk-neutral densities. 


Mixing lognormal densities is the recipe that makes option prices a mixture of 
Black-Scholes prices. As 


oo 
cgs(F, T, X,r,r,0) = ERE (x — X)v(x| F,o, T) dx, (16.27) 
X 


the theoretical option prices are 
c(X | 6, r, T) 
oo 
= e f (x = X)[pw(x | Fi, 01, T) SR (d be pv x | Fy, 02, T)] dx 
X 
= pegs (Fi, T, X, r,r, 01) + (1 — p)ess Q5, T, X, r,r, 02). (16.28) 


It is usually fairly easy to estimate the RND parameters, by minimizing one of 
the functions defined in Section 16.4, although difficulties locating the global min- 
imum have been reported (Jondeau and Rockinger 2000; Coutant et al. 2001; Bliss 
and Panigirtzoglou 2002). It is therefore advisable to compare the optimization 
results obtained from several initial values. The constraint that the variables F;, 
F5, 01, 02 are all positive can usually be omitted from the optimization problem. 
Note, though, that there are two solutions to the estimation problem because the 
numerical values of (F1, o, p) and (F5, o5, 1 — p) can always be interchanged. 

The parameter estimates for the illustrative FTSE 100 March options on 18 
February 2000 are p = 23.8%, F1 = 5735, o1 = 31.1%, F2 = 6383, and 
o = 18.1%. They are obtained by minimizing the sum of squared errors for 31 
call prices, giving a minimum value of 175 for the function G defined by (16.22). 
The estimated standard deviation of the price errors equals 2.5, which is less 
than the average bad ask spread. Figure 16.6 shows the estimated RND as a solid 
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Figure 16.7. Fitted implieds for two RND methods. 


curve. The density is skewed to the left because the higher standard deviation is 
associated with the lower of the two lognormal means. Figure 16.6 also shows the 
density estimate when the RND is lognormal, with o = 25.9%, represented by a 
light line. Compared with the lognormal, the mixture density is seen to have more 
density in the left tail and less in the right tail. Figure 16.7 shows the estimated 
implied volatility function, again as a solid curve. The fit is satisfactory except 
for the exercise prices furthest from the futures price. Table 16.2 includes the 
mean, standard deviation, skewness, and kurtosis for both Sr and log(Sr). The 
estimated probabilities beyond the minimum and maximum exercise prices are 
also tabulated. They equal 1.2% and 2.4%, so that more than 96% of the estimated 
probability is for index values within the range of the traded exercise prices. 

Lognormal mixtures have been estimated for interest rates (Bahra 1997; Söder- 
lind and Svensson 1997; Coutant et al. 2001), exchange rates (Campa et al. 1998; 
Jondeau and Rockinger 2000), and equity indices (Gemmill and Saflekos 2000; 
Bliss and Panigirtzoglou 2002; Anagnou, Bedendo, Hodges, and Tompkins 2002; 
Liu, Shackleton, Taylor, and Xu 2004). A mixture of three lognormals has seven 
free parameters. It is estimated by Melick and Thomas (1997) for the prices of 
crude oil futures during the Gulf War in 1990 and 1991. They motivate the mixture 
by uncertainty about the future supply of Kuwaiti oil. 

The mixture method is fairly easy to apply, it guarantees a nonnegative estimated 
density and itis intuitive when p can be identified with the probability of a relevant 
future event. The estimation of four free parameters may, however, be excessive, 
leading to estimates that are sensitive to the discreteness of option prices (Bliss 
and Panigirtzoglou 2002). Another potential shortcoming is that estimated RNDs 
may be bimodal and hence may be counterintuitive. 
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Table 16.2. Moments for a selection of risk-neutral density estimation methods. 


(Results are given for the density of the FTSE 100 index on 17 March 2000, estimated from 
option prices four weeks earlier. G is the minimum of the sum of squared option pricing errors, 
across 31 exercise prices that range from 4975 to 7025. The six methods are the lognormal, 
a mixture of two lognormals, the generalized beta, the lognormal-polynomial, and linear and 
quadratic implied volatility functions.) 


Logn. Logn.  IVF IVF 
Logn. mixture GB2 poly. linear quad. 

Minimum value of G 5740 175 118 241 171 115 

Index statistics 

Mean 6229 6229 6220 6229 6229 6229 
Standard deviation 447 460 463 463 460 467 
Skewness 0.22 0.66 0.80 0.63 0.79 0.98 
Kurtosis 3.08 3.71 4.37 4.02 4.02 5.66 


Probability below lowest X 0.1% 1.2% 14% 15% 13% 15% 
Probability above highest X 4.6% 2.496 2.1% 2.3% 13% 18% 
Log(index) statistics 


Mean 8.734 8.734 8.734 8.734  Á 8.734 8.734 
Standard deviation 0.0717 0.0764 0.0774 0.0769 0.0767 0.0790 
Skewness 0 0.93 1.16 0.93 1.11 1.58 
Kurtosis 3 4.30 5.82 4.50 5.26 10.48 


16.5.2 The GB2 Distribution 


Four parameters are required to obtain general combinations of the mean, vari- 
ance, skewness, and kurtosis of future asset prices. Bookstaber and McDonald 
(1987) and McDonald and Bookstaber (1991) propose and apply the generalized 
beta distribution of the second kind, called GB2, that has four positive parameters, 
a, b, p, and q. The density function for S7 is 

xP- 1 


a 
fop2(x |a, b, p,q) = ENES x>0. (16.29) 


The B function is defined in terms of the gamma function by 


B(p.q) — U(»)I(q»)/U(p- q). (16.30) 


Bookstaber and McDonald (1987) describe several special cases, including a 
lognormal limit when a — 0 and q — oo in a particular way. 

Multiplication of the GB2 density by x" defines a function that is proportional 
to another GB2 density, in which p is replaced by p + (n/a) and q is replaced 


by q — (n/a): 
x" fopa(x | a, b, p,q) 
_ [D'B(p t n/a, q — n/a) 
E Din, q) 


faga(x | a, b, p +n/a,q —n/a), — (1631) 
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providing n « aq. This is a very useful property. It permits a simple and eco- 
nomically meaningful transformation of a GB2 Q-density into a GB2 P-density, 
as we will see in Section 16.9. It also leads to the following expression for the 
moments of the distribution: 


b” B E 
E[S2] = See UD ge (16.32) 
B(p.q) 
Substituting n — 1 gives the constraint that ensures the density is risk-neutral, 


assuming aq > 1: 


. bB(p t l/a.q — 1/a) 
i B(p.q) 


This result shows that b is a scale parameter. We may regard a, p, and q as the 
free parameters and then derive b from F and the above constraint. It is difficult 
to interpret the free parameters. Note that moments do not exist when n 2 aq. 
The kurtosis of ST is therefore infinite when aq < 4. 

Option prices now depend on the cumulative distribution function (c.d.f.) of the 
GB? distribution, denoted by Fgp2. This function can be evaluated using the c.d.f. 
of the beta distribution, denoted by Fg, which is the incomplete beta function: 


(16.33) 


Fg(u | p,q) = DE =I pou A! dt. (16.34) 


A change of variable inside an integral shows that 
Fop2(x | a, b, p, q) = Fop2((x/b)* | 1,1, p, q) = Fg (u(x, a,b) | p,q) 
(16.35) 
with the function u defined by 
(x/b)* 
1+ (x/b)* 
Call prices are then as follows, assuming the four parameters are constrained by 
(16.33): 


(16.36) 


u(x,a,b) = 


ae H E 


= Fe""[1— Fosz(X | a,b, p- a 5, q -a7 
— Xe "T [1— Fop2(X | a,b, p, q)] 
= Fe"T[1— Fg(u(X,a,b) | p-a *,q—a 0) 
— Xe 'T[1— Fg(u(X, a, b) | p.q)]. (16.37) 
Estimation of the parameter vector, 0 = (a, b, p,q), by minimizing one of 


the functions defined in Section 16.4, is again fairly straightforward. One method 
is to minimize over a, p, and q, witha > 0, p > 0, aq > 1, and b given 


16.5. Parametric Risk-Neutral Densities 441 


0.0012 
— Lognormal 
— GB2 
0.00101 Jose Quadratic IVF 
0.0008 4 
E: 
= 0.0006 | 
o 
a 
0.0004 4 
0.0002 4 
0 : = d i : : : —— 
4000 5000 6000 7000 8000 


Index level on 17 March 2000 


Figure 16.8. Further risk-neutral densities. 


by (16.33). In Excel, the B function can be evaluated using three values of 
GAMMALN(z) which calculates log(/"(z)), while the function Fg(u | p,q) 
equals BETADIST(u, p,q). The only technical problem is selecting the initial 
values when performing the parameter estimation. 

The parameter estimates for the illustrative FTSE 100 dataset are a = 27, 
b = 6750, p = 0.59, and g = 2.37. The minimum sum of squared errors is 
included in Table 16.2. Itremains less than the minimum for the lognormal mixture 
even when either p or q is constrained to be one. Figures 16.8 and 16.9 show the 
estimated GB2 risk-neutral density and implied volatility functions, using solid 
curves. The GB2 and mixture densities are similar but the GB2 provides a much 
better fit to the observed implied volatilities. 

The RND for the general GB2 distribution has been estimated for the S&P 500 
and the sterling/dollar rate by Anagnou et al. (2002). It has also been estimated for 
S&P 500 futures (Aparicio and Hodges 1998) and the spot FTSE 100 index (Liu 
et al. 2004). The special case when g = 1 is the Burr-3 distribution, which has 
been estimated for soybean futures by Sherrick, Garcia, and Tirupattur (1996). 

The GB2 density is easy to estimate, a nonnegative density is guaranteed, and 
we will later see a convenient transformation from an RND to a real-world density. 
The major obstacle to its use is the interpretation of the parameters a, p, and q. 


16.5.3 Lognormal-Polynomial Density Functions 


Standardized returns have standard normal distributions when prices have log- 
normal distributions. Madan and Milne (1994) develop an elegant theory of con- 
tingent claims valuation that assumes the density of standardized returns is the 
standard normal density multiplied by a general function. This function can be 
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Figure 16.9. Fitted implieds for two more RND methods. 


approximated by a polynomial. The density of prices is then a lognormal density 
multiplied by a polynomial function of log(x). 

This method involves more mathematics than the others. All the following 
distributions are for the risk-neutral measure Q. We begin by supposing the change 
in the logarithm of the futures price, log(Fr) — log(F), has finite variance o? T 
and mean equal to uT — Io?T. This defines two parameters u and o. Then we 
define the standardized "futures return" Z by 


log(Fr/F) — (uT — 50°T) 
Z= 16.38 
cT ue 


so that Z has mean 0 and variance 1. The densities of Z and Sr = Fr are related 
by 


1 
fs7 (x) = xo Jr el (16.39) 
with 
_ log(x) — log(F) — uT + lo?T 
Z= o/T . 


Of course Z ~ N (0, 1) and u = 0 when the distribution of Fr is lognormal. 
The general form of the density for Z is 


fz(2) = €) | b; Hj). (16.40) 


j=0 
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for constants b; and normalized Hermite polynomials H;, commencing with 


E 2 ler 
Ho(z) = 1, Ay(z) =z, H(z) = va’ 1), 


(z^ — 62 + 3), 


(16.41) 


Ls EN 
JAM — 3z), AE 


These polynomials are “orthogonal,” a property here defined by integrals as 


H3(z) = 


f nonoros St E E (16.42) 
Dee TE i 
It follows that = 

f H; (z) fz(z) dz = bj. (16.43) 


Since fz(z) is the density of a standardized random variable, 
bo — 1, bi — 0, b2 = 0, 
skewness(Z) = E[Z?] = V6b3, 
kurtosis(Z) = E[Z*] = 3 + V24b4. (16.44) 


The coefficients bj, j > 3, are constrained because the density of Fr is risk- 
neutral, as is shown later in equations (16.47) and (16.51). 

Most implementations of the method assume the ratio fz(z)/@(z) is a poly- 
nomial of order four. Then b; = 0 for j > 5 and the parameter vector becomes 
0 = (, 0, bs, ba). The risk-neutrality constraint reduces the number of free 
parameters to three. It is important to appreciate that b3 and b4 must then be con- 
strained to ensure the density is never negative. Jondeau and Rockinger (2001) 
show the kurtosis of Z is between three and seven when the distribution is symmet- 
ric and that the maximum kurtosis is less for skewed distributions. The permissible 
range of skewness values for Z depends on the kurtosis, with all feasible values 
within +1.05. 

The payoff from a call option is a function of Z, which can be represented as 
an infinite-order polynomial by 


oo 


(Fr —X)t = «cono». (16.45) 
k=0 


The functions aj (X) do not depend on Z. Therefore, by applying the orthogonality 
property (16.42), we can obtain 


c(X) =e EFr - 30*] S e 7 5 "ajQDb;. (16.46) 
j=0 
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Likewise, the futures price is given by 
oo 
F = E[Fr] = 3 a;(0)bj. (16.47) 
j=0 
Implementation requires fz(z)/$ (z) to be a polynomial of finite order J, so 
that b; = 0 for j > J. The usual choice is J = 4. The required functions a; (X) 
are then 


3 4 
ao(0) = Fe"T, EE ret, as(0) = ret, (16.48) 


and, for X > 0, 


ao(X) = Fe" N(Di) - XN(D — o VT), 
a3(X) = Je Fe IE NOD) + QB — Die), 


a4(X) = Aun INO) + G£? - 3BDi + D? —1)¢(D1)]_ (0649) 
with 
log(F/ X) + (u + 3o?)T 
oT 
These formulae can be derived from equations in Madan and Milne (1994). 
Assuming J = 4, the parameter vector 0 = (u, 0, b3, b4) is estimated by 


minimizing one of the functions suggested in Section 16.4, with the risk-neutrality 
constraint 


B=oVT and Dj(X)— (16.50) 


Bb iE Bb. = eHI 

Je ETT n 
Further constraints may also be required to exclude negative density estimates, 
as discussed by Jondeau and Rockinger (2001). 

The parameter estimates for the illustrative FTSE 100 dataset include b3 = 
—0.397 and b4 = 0.217 when b3 and b4 are not constrained. The density of Z is 
then negative for 2.4 < z < 4.0 corresponding to a narrow range of asset price 
levels beyond the highest exercise price, between 7600 and 7700. Adding con- 
straints that ensure the density of Z is not negative on a suitable grid leads to the 
estimates u = 8.88 x 1074, ø = 0.278, b3 = —0.379, and b4 = 0.308. The sum 
of squared errors, G, then equals 241 compared with 161 for the unconstrained 
optimization. Figures 16.6 and 16.7 show the estimated lognormal-polynomial 
risk-neutral density and implied volatility functions, using dotted curves. The 
curves are similar for the lognormal-polynomial and the lognormal mixture spec- 
ifications, although the density of the polynomial variety is less smooth for index 
levels around 5500. 


1+ (16.51) 
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Lognormal-polynomial density functions are estimated by Madan and Milne 
(1994) and Ané (1999) for the S&P 500 index, Jondeau and Rockinger (2000, 
2001) for the French franc rate against the Deutsche mark, and Coutant et al. 
(2001) for French interest rates. The method has strong theoretical foundations 
and is fairly easy to implement. However, negative densities can often only be 
avoided by restricting the levels of skewness and kurtosis permitted in the density 
functions. 

Similar but more complicated functions of lognormal and polynomial terms are 
given by the Edgeworth expansion method of Jarrow and Rudd (1982). Details and 
examples can be found in Corrado and Su (1996, 1997), Jondeau and Rockinger 
(2000), and Brown and Robinson (2002). 


16.5.4 Densities from Stochastic Volatility Processes 


Any risk-neutral specification of the process followed by asset prices has the 
potential to yield estimates of RNDs. A realistic specification will incorporate 
stochastic volatility. Quick density estimates will follow if the formula for option 
prices has a “closed form." A plausible asset price process is therefore the stochas- 
tic volatility diffusion process of Heston (1993), discussed in Section 14.6. The 
risk-neutral dynamics for the asset price 5; and the variance V; are then 


d(log S) = (r — q — 1V)dr + / V aW, 
dV = (a — bV) dt +EV/VdZ, (16.52) 


with correlation p between dW and dZ. Numerical integration provides both the 
option pricing formula c(X) and the risk-neutral density fọ (x), as functions of 
the parameter vector 0 = (a, b, £, p, Vo) and the observable quantities S, r, q. 
The pricing formula is 


c(X) = Se TT P(X) — Xe "T Py(X), (16.53) 


with the probabilities Pj (X), P2(X), and the density fo(x) given by integrals 
stated in the appendix to Chapter 14. 

There are five free parameters in the pricing formula and the density function. 
This may be an excessive number if the parameters are estimated from option 
prices for one expiry time 7'. As the parameters are the same for all T', itis logical 
to estimate them from a matrix of option prices that combines several values of X 
with several values of T. This cannot be done for most of the other methods for 
estimating RNDs. Jondeau and Rockinger (2000) tabulate parameter estimates 
for the FF/DM exchange rate on two days. The estimates from single values of T 
are similar for a and p, but are rather variable for b, E. and particularly Vo. The 
joint estimates when T is one, three, six, and twelve months are more plausible. 
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16.6 Risk-Neutral Densities from Implied Volatility Functions 


Implied volatilities deviate from a constant function when the RND deviates from 
alognormal density. Thus it may be easier to specify an implied volatility function 
(IVF) than an RND. Also, implied volatilities are directly observable which makes 
IVF estimation attractive. 


16.6.1 Theory 


The simpler notation o (X | 0) is now used for the IVF, oimpiiea(X), with 0 a set 
of parameters. Then the call price formula is 


c(X | 0) = cgs(S, T, X, r. q, o (X | 0)). (16.54) 

The function o (X | 0) is often assumed to be a polynomial. Shimko (1993) was 
the first to suggest a quadratic, 

o(X |0)=a+bX +cX?, (16.55) 


so that 0 = (a, b,c). 
When the call price formula does not permit arbitrage profits, the RND follows 
from (16.13) as 


ust (16.56) 
9 a 3X2 i 
An analytic RND expression follows by differentiating (16.54), to give 
ac do 
rT = _N(d. XVT $(d2)) —— 16.57 
le (da) + (XVT O(a) — (16.57) 
and 
rp fie 
9x? 
1 2d, \ do dex T (2 j 926 
= ó(d XV) }, 
een Sa + (SS +( - ay) t CDs 
(16.58) 


with dı and d» the Black-Scholes functions of X defined by 
log(F/ X) + 3o (XY T 

c (XT (16.59) 
dX(X) = d(X) — o (X)VT. 


The partial derivatives are zero and the density is lognormal (see (16.16)) when 
the IVF is a constant. The calculation of the density can always be checked against 
the numerical approximation 


poce OOD 2600 be (E e) 


62 
with ô a small fraction of X. 


di(X) — 


(16.60) 
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The density obtained from (16.56) is automatically risk-neutral, if weak con- 
straints apply to o (X | 0). This can be checked using integration by parts: 


[&)e- [ (Be 
o | Ox NOx 9X |o 0 dr 
= [0 — 0] — [c] 
= -[0— Fe™"T] = Fei, (16.61) 


16.6.2 Implementation 


The original strategy for implementing the IVF method is to select a paramet- 
ric function o (X | 0), guided by inspection of observed implieds, that has a 
few parameters (Shimko 1993). Plausible functions, including quadratics, are not 
guaranteed to give nonnegative densities and implieds for all positive X. These 
conditions usually have to be checked. In some cases it is sufficient that they 
apply for a range of values over which the integrals of fo(x) and xfg(x) are 
respectively very near to one and the forward price. Lee (2004) offers advice 
about extrapolation of the IVF. 

A linear IV function provides a good fit to the illustrative option prices, with a — 
0.870 and b = —0.977 x 107^. To prevent negative values, this function cannot be 
extrapolated beyond X — 8900; this is unimportant because the linear IVF defines 
an adequate density function over the interval from 0 to 8000. The minimized sum 
of squared price errors equals 171 for this two-parameter specification, which 
compares well with the 175 for a lognormal mixture that has four free parameters. 
A quadratic IV function reduces G to 115, which is similar to the 118 obtained by 
the GB2 method. The parameter estimates are then a = 1.78, b = —3.93 x 1074, 
and c = 2.40 x 1078, for which the IVF and the RND are always positive; the 
minimum value of the quadratic is 0.168 at X = 8200. Figures 16.8 and 16.9 show 
the estimated risk-neutral density and quadratic implied volatility functions, using 
dotted curves. For our data, these functions are very similar to those for the GB2 
distribution within the range of traded exercise prices. The quadratic IVF density 
has a long left tail and the most negative skewness of all the methods. Table 16.2 
includes summary statistics for the IVF method and the three parametric methods 
described in the previous section. Section 16.10 illustrates the quadratic IVF 
calculations using an Excel spreadsheet. 

A second strategy has been developed by Malz (1997a) and guarantees that the 
tails of the RND are well behaved. The original market data, made up of pairs 
of exercise and option prices, (Xi, cm (Xj)), is converted into pairs of deltas and 
implied volatilities, (6;, o;), with 


d 
Ar = ag 88 T, Xi, r, q, oj) =e TT N(di(Xi, o) (16.62) 
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and the function dı given by (16.59). A parametric relationship o = g(6 | 0) is 
then estimated. From this the IVF is given by numerically solving the equation 


c (X) = g(e 4* N (dı (X, o (X))). (16.63) 


The RND is then obtained from the numerical second differences of the theoretical 
call prices, using (16.54) and (16.60). Each tail of the RND is approximately 
lognormal as o (X) is approximately constant for small X (6 = 0) and large X 
(6 = e741). Malz (1997b) uses a quadratic function o = a + bó + có? to estimate 
the RND from only three FX option prices. 

A third strategy uses many more parameters by fitting a cubic spline to the 
observed implieds, either as a function of X (Campa et al. 1998) or as a function of 
delta (Bliss and Panigirtzoglou 2002, 2004). These splines are more flexible than 
simple polynomials. They are general cubics between the observations and they 
are constrained so that the functions and their first two derivatives are continuous. 
Either a perfect fit can be guaranteed (Campa et al.) or the quality of the fit can be 
traded off against the smoothness of the fitted function after subjectively selecting 
a trade-off parameter (Bliss and Panigirtzoglou). Splines are also used by Bates 
(1991, 2000), but to fit the call pricing formula instead of the implied volatility 
function. 


16.7 Nonparametric RND Methods 


Parametric methods restrict the shapes that can be estimated for RNDs. The 
extra generality of nonparametric alternatives introduces new problems, how- 
ever, including subjective choices, assumptions of stability through time, and 
inappropriate shapes. 


16.7.1 Flexible Discrete Distributions 


Flexible shapes are obtained by adopting a minimal set of constraints in conjunc- 
tion with a large number of degrees of freedom. Rubinstein (1994) achieves this 
by estimating discrete probability distributions that have n + 1 possible values 
Sj. Then p = (po, pi. .... Pn) is an RND if 


n n 
PoP Du Ypj2d, and Yp;Sj- F. (1664 
j=0 j=0 


A large value of n is preferable, typically in excess of 100. The option pricing 
formula is now 


c(X) =e"? M pj max(S; — X, 0). (16.65) 
j=0 
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Also, the probabilities are proportional to the prices of butterfly spreads when the 
differences $;,. — Sj are all equal to a common, positive value A: 


EN, c(Si..1) — 2c(Si) + Eër 
A 
The vector p can be estimated by minimizing a variety of functions. Jackwerth 
and Rubinstein (1996) seek low values of g(p) + wG(p) with G measuring 
the match between observed and fitted option prices (as in (16.22)) and with g 
measuring the smoothness of the RND by 


O<i<n. (16.66) 


n 
&(p) = Y (pj-1 - 2pj + pj. (16.67) 
j-0 

The positive trade-off parameter w is chosen subjectively and p_1 = pn+1 = 0. 
They describe an efficient optimization algorithm and illustrate its results for 
S&P 500 index options from 1986 to 1993. This algorithm does not guarantee 
nonnegative probabilities, although it seems they can be avoided by a careful 
choice of the range of possible prices, So to S,. Jackwerth (2000) describes a 
related estimation methodology that seeks low values for the curvature of the 
implied volatility function rather than for the curvature of the RND as in (16.67). 
He finds all the estimated probabilities are nonnegative for stated values of w and 
S, — Sp. 


16.7.2 Kernel Regression Methods 


Nonparametric regression estimates can avoid making assumptions about the 
shape of a regression function. Ait-Sahalia and Lo (1998, 2000) consider estima- 
tion of either the call price formula or the implied volatility function (IVF) using 
option price datasets across several days; each dataset contains prices for several 
expiry times. Estimation is easier when the number of explanatory variables is 
reduced as much as possible. The simplest way to implement the method estimates 
the IVF as a function of Z = X/Fr and T, with Fr the futures price now for a 
transaction at time T, as in Ait-Sahalia and Lo (2000) and Ait-Sahalia, Wang, and 
Yared (2001). This method relies on the IVF being stable during the estimation 
period. This is a strong assumption. It is empirically dubious for a whole year of 
option prices, as used by Ait-Sahalia and Lo (1998, 2000) in their research into 
S&P 500 data for 1993. Ait-Sahalia and Duarte (2003) describe an alternative 
nonparametric method that enforces the constraint that the RND is nonnegative. 
Fewer data are then required to estimate densities. 


16.7.3 Convolution Approximation 


The positive convolution approximation method of Bondarenko (2003) has sim- 
ilarities with both nonparametric smoothing methods and parametric mixture 
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methods. His RNDs are mixtures of normal densities that have equispaced means 
and identical standard deviations. The weights of the component normal densities 
are obtained by solving a quadratic programming problem. 


16.7.4 Entropy Methods 


The entropy of a general RND is defined by 


I(fg) — -f fo(x) log(fo(x)) dx = — EH [log(Sr)]. (16.68) 


Buchen and Kelly (1996) suggest estimating the RND by maximizing the entropy 
subject to the constraint that a set of observed option prices are perfectly matched 
by the theoretical call price formula. Entropy maximization may appear to make 
few assumptions. However, the RND has a special form when N option price 
constraints are included. The solution then depends on N + 1 Lagrange multipliers 
A i: 


Sa N 
fox) = iof f h(y)dy with h(x) = exp Lu + Y (x — sc) 
= (16.69) 
This continuous density has N + 1 segments, each of which is an exponential 
function. The first multiplier ensures the distribution is risk-neutral. The multi- 
pliers must be estimated by numerical solution of a system of nonlinear equa- 
tions. Coutant et al. (2001) include numerical examples of the estimated RNDs 
for interest-rate futures. Stutzer (1996) also applies the principle of maximizing 
entropy. Two criticisms of the method are that it is inappropriate to exactly match 
observed option prices (that are necessarily discrete) and that maximization of 
—E[log Sr] is an ad hoc objective. 


16.8 Towards Recommendations 


No one can say which method for estimating implied RNDs is best. Several 
methods are likely to be satisfactory when enough exercise prices are traded and 
their range captures most of the risk-neutral probability. A satisfactory method 
will score highly on the following eight criteria. 
(i) Estimated densities are never negative. 
(ii) General levels of skewness and kurtosis are allowed. 
(iii) The shapes of the tails are fat relative to lognormal distributions. 
(iv) There are analytic formulae for the density and the call price formula. 
(v) Estimates are not sensitive to the discreteness of option prices. 


(vi) Solutions to the parameter estimation problem are easy to obtain. 
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(vii) Estimation does not involve any subjective choices. 
(viii) Risk-neutral densities can be transformed easily to real-world densities. 


Often it will also be appropriate to expect methods to deliver unimodal densities. 

Nearly all methods are known to be unsatisfactory for at least one of the 
above criteria, including the following: lognormal mixtures, criteria (iii), (v), 
(vi); lognormal-polynomials, (i), (ii), (viii); stochastic volatility, (ii), (iv), (vi); 
parametric implied volatility functions, (i), (viii); spline IVFs, (iii), (iv), (viii); 
flexible discrete distributions, (vi), (vii); kernel regressions (iv), (vi), (viii). Only 
the GB2 method of Section 16.5 appears to satisfy all the above criteria but it has 
not yet received much critical scrutiny. 

There are few recommendations in the research literature because most RND 
studies only evaluate one method. Bahra (1997) prefers lognormal mixtures to 
parametric IVFs. Campa et al. (1998) prefer flexible discrete distributions to log- 
normal mixtures and cubic splines for IVFs, although all three methods give 
comparable densities. Jondeau and Rockinger (2000) compare lognormal mix- 
tures, lognormal-polynomials and Edgeworth expansions, jump-diffusions, and 
stochastic volatility specifications. They prefer lognormal mixtures for short- 
lived options and otherwise jump-diffusions. Coutant et al. (2001), however, 
prefer lognormal-polynomials to lognormal mixtures and entropy maximization. 
Bliss and Panigirtzoglou (2002) prefer implied volatility functions, made up from 
splines, to lognormal mixtures. 


16.9 From Risk-Neutral to Real-World Densities 


The relationship between the real-world density fp (x) and the risk-neutral density 
fg (x) can be estimated from time series of asset and option prices in at least two 
ways. We first consider a method that specifies a transformation from fọ to fp 
using economic theory and then present an econometric method that avoids such 
theory; examples of appropriate formulae for fp are given by equations (16.76) 
and (16.91). Both methods are illustrated for FTSE densities at the end of this 
section. 


16.9.1 Transformations from Stochastic Discount Factors 


The theory of asset pricing relates current prices to expectations of discounted 
future prices. When the market has a formula for pricing call options across all 
exercise prices, 


c(X) =e" EL[(Sr — X)*] 


=e’! [ic — X)* fo(x) dx 
0 
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ETA Zi + fo(x) 
= IR ot PE fos) dx 


= EP [m(Sp)cr (Sr, X)]. (16.70) 


Here cr (Sr, X) = (Sr — KIT is the price of the option at time T and the stochastic 
discount factor for all options is the random variable m (Sr) defined as 


—rT fo (ST) 
frr) 


Another name for the stochastic discount factor is the pricing kernel. Equation 
(16.70) is the foundation of asset pricing theory based upon present and future 
consumption. Cochrane (2001) provides a comprehensive discussion of the theory 
for many areas of finance. Its application to option prices is also covered by Ait- 
Sahalia and Lo (2000) and Rosenberg and Engle (2002). 

Theory relates the stochastic discount factor to the utility function of a repre- 
sentative agent when some assumptions are made, thereby providing insight into 
a suitable formulation for the ratio fg (x)/ fp (x). Various theoretical assumptions 
are employed by Jackwerth (2000) and Ait-Sahalia and Lo (2000), who cite earlier 
contributions by Lucas (1978), Constantinides (1982), and Merton (1992) among 
others. With sufficient assumptions, the stochastic discount factor is simply pro- 
portional to the representative agent's marginal utility of terminal consumption, 
which can be equated to the terminal asset price. Then 


m(Sr) =e (16.71) 


du 
m(x) = A (16.72) 
dx 


for some utility function u(x), with A an irrelevant positive constant. 

The power utility function is used to obtain real-world densities from risk- 
neutral densities by Bakshi, Kapadia, and Madan (2003), Bliss and Panigirtzoglou 
(2004), and Liu et al. (2004). With 


xi 
> V#l, 
u(xx)={ 1-7 (16.73) 
log(x), yl 
the marginal utility is 
; du _ 
u(x)=—=x T (16.74) 
dx 
and the relative risk aversion is constant and equal to the CRRA parameter y: 
xu" (x) 
RRA(x) 2 — = (16.75) 
u'(x) 


The parameter y is positive when the agent is risk averse and it equals zero for the 
special case of a risk-neutral agent. From equations (16.71), (16.72), and (16.74), 
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the real-world density is then proportional to the risk-neutral density multiplied 
by x”, hence 


fex) = » foco [ y” fo(y) dy. (16.76) 


The above integral has to be evaluated numerically for many RND methods. 

There are at least three important analytic formulae for fp (x) based upon power 
utility functions. First, geometric Brownian motion for asset prices makes fg (x) 
lognormal as in (16.16). Multiplication of fo by x” = exp(y log(x)), followed 
by simplifying the exponent of the exponential function, leads to the conclusion 
that fp is also lognormal. We obtain 


poate Bj y | Ee, T) dy 


w(x | Fe, o, T). (16.77) 


The densities fp and fg have the same volatility parameter o, with different 
expectations given by 


EP [Sp] = Se*74*Y9)T and E9[Sy] = Se 79, (16.78) 
The annualized risk premium, when expected returns are continuously com- 
pounded, is given by 
T~! log(e1f EP[Sr]/S) — T7! log(e?" E2[S7]/S) = yo?. (16.79) 
Thus the CRRA parameter y then equals the annualized risk premium for the 
underlying asset divided by o. For a typical equity index premium of 6% per 
annum and a volatility of 15% per annum we obtain y = 2.67. Conversely, 
within the Black-Scholes pricing framework both fp and fg are lognormal; if 
the assumptions of the representative agent model also apply, then the agent must 
have a power utility function. 


Second, suppose fg is the mixture of two lognormal distributions defined in 
Section 16.5, i.e. 


fo(x) = pw | Fi, o1, T) +0 — p) | F2, 02, T). (16.80) 
From the formula for the moments of the mixture distribution (equation (16.26)), 
fp) = x” fo(x)/k@) 
with 
bg) = pF] exp(h(y? — y)o2T) + (1 — p)F exp (y? — yyoT). (16.81) 


This density is also a mixture of lognormal densities. From (16.77) it can be shown 
that 


fp(x) = p*v(x | Ff,oy, T) + (0 — pv | Fy, 02, T) 
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F* = F;exp(yo?T), i—1,2, 


Y 
E =1+ SE (=) exp(4 ? — y)(o2 — off) (16.82) 

Third, suppose fg is the GB2 density defined by equation (16.29), with four 
parameters a, b, p, and q. Then (16.31) shows that fp is also a GB2 density, with 
parameters a, b, p + (y /a), and q — (y /a), providing y < aq. 


16.9.2 Estimates of the CRRA Parameter y 


Estimation of y in the context of the representative agent model is known to be a 
difficult problem, because analysis of consumption data leads to implausible high 
estimates that are necessary to explain the “puzzling” high level of the equity 
premium (e.g. Mehra and Prescott 1985). As our particular interest in y is to 
use it to move from risk-neutral to real-world densities, it is logical to select y 
to obtain a good match between observed asset prices and real-world densities. 
We could simply note that y and fg together determine the risk premium for 
the asset, as illustrated by (16.79), so an estimate of y can be inferred from an 
estimate of the premium. More sophisticated alternatives to matching the mean of 
observed asset prices are either maximizing the likelihood of the observations or 
minimizing test criteria that detect mis-specification of the real-world densities. 
These alternatives may, however, in effect be an indirect way to obtain satisfactory 
means for P-densities. 

Bliss and Panigirtzoglou (2004) use spline methods to fit their Q-densities for 
both S&P 500 options (1983-2001) and FTSE 100 options (1992-2001). They 
then select y to make the P-densities conform as closely as possible with the 
calibration criteria introduced after the next paragraph. This requires minimization 
of a likelihood-ratio test statistic proposed by Berkowitz (2001). Their estimates 
of y vary with the option horizon T. They equal 3.9 (FTSE) and 4.0 (S&P) for a 
horizon of four weeks. The similarity of the estimates for the US and UK markets 
occurs for all horizons up to four weeks and is interesting. They also report similar 
measures of risk aversion when the utility function is assumed to be an exponential 
function. 

Liu, Shackleton, Taylor, and Xu (2004) fit risk-neutral densities to high-fre- 
quency, FTSE 100 option prices from 1993 to 2000. They use nonoverlapping 
P-densities to define the likelihood of a set of 83 four-week returns. Maximizing 
the likelihood as a function of y gives estimates equal to 3.8 and 4.0, respec- 
tively for lognormal mixture and GB2 densities. Likelihood comparisons are 
made between densities obtained from option prices and the utility transforma- 
tion, densities obtained by simulating an asymmetric ARCH model estimated 
from daily index returns, and encompassing densities that combine the option 
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and historical densities. Significant incremental density information is found in 
the option densities, at the 2% significance level, but it is not found in the historical 
densities at the 5% level. 


16.9.3 Calibration Conditions 


Any method that produces a time series of real-world densities fp can be appraised 
by checking if the densities are properly calibrated. Let Fp and F Pu respectively 
denote the cumulative distribution function (c.d.f.) and its inverse function (not 
its reciprocal), so 


x Fg u) 
F(x) = f fr(y)dy and “=f fr) dy (16.83) 


for 0 € u < 1, here assuming the density is defined for x > O0 and that it is a 
positive and continuous function. Also let Feorrect be the actual real-world c.d.f., 
which is unknown. Observe that the c.d.f. of the random variable U = Fp(Sr)is 


PU <u) = P(Sr < Fg (u)) = Fcnea(Fp (it, forü ene I. (16.84) 


The two c.d.f.s Fp and Foorrect are identical when the density of ST is correctly 
specified. When this happens, 


P(U < u) = Fp(Fp (u)) =u. (16.85) 


Thus U is uniformly distributed, between 0 and 1 inclusive, if and only if the 
density of ST is correctly specified. 

Furthermore, suppose density fp; is produced at time f; for the asset price 
at time t; + T; and these densities do not overlap, i.e. ti + T; < tj41. Then 
the stochastic process {U;} is i.i.d., with the above uniform distribution, when 
all the densities are correctly specified. The two assumptions of uniformity and 
independence can be checked either separately (Diebold, Gunther, and Tay 1998) 
or jointly by using tests described in Berkowitz (2001). The data for these tests, 
when there are n densities, are given by the observed cumulative probabilities, 


uj = Fpi(Sy4m), ISIS. (16.86) 


16.9.4 Recalibration Transformations 


The cumulative probabilities u; should be compared with the uniform distribution 
whenever densities are produced for several nonoverlapping periods. Summary 
statistics, such as the minimum, maximum, and the three quartiles, may then 
indicate that the densities are not correctly calibrated. For example, if few of the 
uj are less than one-quarter, this is evidence that the densities overestimate the 
probability of a moderate to large fall in the asset price. 
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Fackler and King (1990) describe recalibration methods that improve a set of 
densities when they are judged against the assumption that their c.d.f.s are uni- 
formly distributed. Their method can be applied to any set of estimated densities 
and can be used to directly transform risk-neutral densities into real-world den- 
sities. The key assumption is that the u; are all observations from a common 
probability distribution. 

Now let f(x) and F(x) denote the uncalibrated density and cumulative dis- 
tribution function of Sr obtained from some method. The definition of f is not 
important; for example, it might be an RND or it might be a utility adjusted RND 
given by (16.76). Define the random variable U and its c.d.f., the calibration 
function C(u), by 


U = F(Sr) and C(u)- P(U <u). (16.87) 
The random variable C(U) is uniformly distributed, because 
P(C(U) € u) 2 P(U < Cu = Cic" y) zs (16.88) 


Now define the calibrated cumulative distribution function Fp and a random 
variable U p by 


Fp(x)-— C(F(x) and Up = Fp(Sr) =C(F(Sr)) =CWU). (16.89) 


Then Up is uniformly distributed and hence Fp is correctly specified. 

The only catch is that we need to know the function C. This could be estimated 
from a set of observations u;. A simpler approach is to assume a parametric 
specification. Fackler and King (1990) use the cumulative function of the beta 
distribution, which is the incomplete beta function defined by (16.34). This is now 


written as i 


Bio, B) 
The calibrated density is then 
dFp(x) dC(F(x)  dCdF 


C(u) — 


f Qo DE dt. (16.90) 
0 


fp(x)-— 


dx dx |  dFdx 
| FQy-lü - FQ)! 
= Se f). (16.91) 


The special caseo = 6 = 1 corresponds to the original densities f being correctly 
specified. 

When f is a risk-neutral density fo, equation (16.91) converts fg to a real- 
world density fp.Fackler and King (1990) use the equation to convert risk-neutral 
lognormal densities into properly calibrated real-world densities for the prices of 
corn, soybeans, live cattle, and hogs. 

Liu et al. (2004) recalibrate risk-neutral mixture and GB2 densities for the FTSE 
100 index, by maximizing the likelihood of a set of observed values for Sr. Their 
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Figure 16.10. Real-world densities. 


estimates of the two positive parameters are @ = 1.4 and B = 1.1. They show that 
a necessary and sufficient condition for the utility function u (x) implicit in (16.91) 
to have the risk-aversion property u"(x) < Ois 8 < 1 <a, witha # f. The 
likelihood estimates become @ = 1.3 and B — ] when the risk-aversion constraint 
is applied. All their sets of densities are satisfactorily calibrated, according to a 
Kolmogorov-Smirnov test. However, the least value of the Kolmogorov-Smirnov 
statistic is obtained for historical densities provided by ARCH simulations. 


16.9.5 FTSE Example 


Four densities for the index level on 17 March 2000 are plotted in Figure 16.10. 
The ARCH density of Section 16.2 is shown by a light continuous curve. Its 
mode is to the left of the other three modes, all of which are derived from option 
prices and GB2 densities. The risk-neutral GB2 density, defined and discussed 
in Section 16.5, is represented by the light dotted curve. It is transformed into a 
real-world GB2 density, with CRRA parameter y = 2, using equation (16.76) and 
the parameters defined in the paragraph after equation (16.82). This real-world 
density is shown by the dark dotted curve. The risk-neutral density is also adjusted 
using the calibration equation (16.91), with a = 1.3 and 6B = 1.1, to give the 
values on the dark, continuous curve; this real-world density is not a GB2 density. 
The selected values of the parameters y, o, and f give annualized risk premia 
equal to 14% and 15% for the utility and recalibration methods. The premia are 
high because volatility was at a high level on 18 February 2000. 

From Figure 16.10, the ARCH density is seen to have a lower mean than the 
GB2 real-world densities. This occurs because the ARCH density is estimated 
at the close of spot trading on 18 February while the option-based densities are 
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obtained from prices at midday on the 18th; the index futures contract fell by 87 
points from midday until the spot market closed. 

Table 16.1 includes the first four moments of Sr and log(S7) for each of the 
four plotted densities. All the densities obtained from option prices are much 
more negatively skewed than the ARCH density. The option densities also have 
much more excess kurtosis. The transformations from Q- to P-densities reduce 
the standard deviation, skewness, and kurtosis statistics, but the reductions are 
not substantial. 

The skewness statistics for four-week-ahead FTSE 100 densities, from 1993 
to 2000, are similar to those for the example given here. They average —0.7 for 
Q-densities and —0.6 to —0.5 for P-densities obtained by utility or recalibration 
transformations, but only —0.1 for ARCH densities (Liu et al. 2004). 


16.10 An Excel Spreadsheet for Density Estimation 


This section only illustrates the calculation of risk-neutral and real-world densi- 
ties. Excel calculations are shown in Exhibit 16.1 for the method that assumes the 
implied volatility function is a quadratic. This is a straightforward method that is 
generally satisfactory within the range of traded exercise prices. The spreadsheet 
formulae are shown in Table 16.3. The calculations could certainly be simplified 
by writing a few Visual Basic functions. 

The data used for the calculations are similar to one-third of the data used to 
obtain the previous illustrative results in this chapter. Cells B2-B5 contain the 
present spot rate S, the time until expiry of the option contracts T, the risk-free 
rate r, and the dividend yield q. These are used to obtain the present futures price 
F for a contract having the same lifetime as the options. As we know F for the 
FTSE data, the values of S and q have been replaced by the values of F and r. 
Cells A11—A21, B11—B21, and D11—D21 contain the option data, which comprise 
exercise prices, market implied volatilities, and European option prices. One way 
to obtain the implied volatilities is by repeated use of the Solver tool. If necessary, 
this can be done using the subsidiary spreadsheet Exhibit 16.2, whose formulae 
for cells E11, G11, and H11 are identical to those in Exhibit 16.1. 

The implied volatility function is assumed to be defined by 


c (X) 2 a4 b(X/d) + c(XJdY (16.92) 


with a, b, and c parameters and with d a user-chosen scaling factor that is placed 
in cell G2. The implied volatility function defines the implieds in cells C11—C21, 
that depend on the parameter values in cells E2-E4. From these implieds we 
obtain the call prices in cells E11—E21 and the squared pricing errors shown in 
F11—F21. The sum of the squared pricing errors, shown in cell E6, is minimized 
using Solver by varying the contents of cells E2—E4. It is advisable to try a few 
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An example of density calculations using option prices. 


Exhibit 16.1. 
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Table 16.3. Formulae used in the density estimation spreadsheet. 

Cell Formula 

B6 -B2*EXP(B3* (B4-B5)) 

B7 -EXP(-B3*B4) 

E6 =SUM(F11:F21) 

C11 -$E$24($E$3*A11/$G$2)- ($bE$4*A11*A11/($6G$2*$6G$2)5) 

E11 =$B$7*($B$6*NORMSDIST H11)-A11*NORMSDIST(H11-G11)) 

F11 =(E11-D11)*2 

G11 =C11*SQRT($B$3) 

H11 =0.5*G11+(LN($B$6/A11) /G11) 

B30 -EXP(C-0.5*H30*H30) * CC1/(E30*130)) 4 (2*G30*F30/E30) 
+(G30*H30*130*F30*F30/E30) 
-(I30*2*$E$4/($6G$2*$G$2))) /SQORTC2*PIO) 

C30 2-L30/$L$26 

D30 =B30*(K30° ($3$24-1)) * CCL-K30)^ ($3$25-1))/$3$26 

E30  z$bE$24($E$3*A30/$G$2) - ($E$4*A30*A30/($G$2* $G$2)) 

F30 =($E$3/$G$2)+2*$E$4*A30/ ($G$2*$G$2) 

G30 =(LN($B$6/A30)+0.5*E30*E30*$B$3) /CE30*SQRT ($B$3)) 

H30 =G30-E30*SQRT ($B$3) 

I30 =A30*SQRT($B$3) 

J30 =A30*B30 

K30 =1-NORMSDIST (H30) 

-130*F30*EXP(-0.5*H30*H30) /SQRT(2*PIO) 

L30 -B30*((A30/$B$6)^ $G$24) 

M30  -A30*C30 

N30 =A30*D30 

B26  -K330-K30 

B27 =($A$31-$A$30) *SUM(J30: 3330) 

C26  2($A$31-$A$30) *SUMC(C30: C330) 

C27  2($A$31-$A$30) *SUM(M30:M330) 

D26 | 2($A$31-$A$30) *SUM(D30:D330) 

D27  2($A$31-$A$30) *SUM(N30:N330) 

J26 =EXPCGAMMALN(J24)+GAMMALN (J25) -GAMMALN(J24+325)) 

L26 =(C$A$31-$A$30) *SUM(L30:L330) 


different initial values for the optimization problem. Exhibit 16.1 shows the best 
solution obtained. 

The risk-neutral and two real-world densities are shown in columns B to D, 
from row 30 onwards. The range of possible prices Sr when the options expire 
has to be selected so that there is almost no probability that the outcome for Sr 
is outside the range. The cumulative risk-neutral probabilities are useful when 
selecting the range. They are shown in column K and are given by 


à ə 
Fo(x) 214 e? = 1—N(d))+ XV T6 (d). (16.93) 
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Exhibit 16.2. Calculation of implied volatilities. 


from equations (16.12) and (16.57); for the quadratic IVF, 
ðo b  2cx 


Ax cd dE 
Densities are obtained for the range from 2000 to 8000 on the spreadsheet, with 
a step size of 20, and hence the density values are located in rows 30—330. 

The risk-neutral density fg (x) is in column B and is given by equation (16.58), 
with Ae /äx as above and with 92c/8x? = 2c/d?. The utility transformation to 
a real-world density uses the CRRA parameter y in cell G24. Then fp(x) is 
proportional to (x / F)” fo (x), which is in column L. The approximate numerical 
integral of (x/ F)" fo (x) is in L26 and is used to calculate the values of fp(x) 
in column C. The calibration parameters o and f that appear in equation (16.91) 
are in cells J24 and J25, with B(o, 8) in J26. These values and the cumulative 
probabilities in column K are used to obtain the calibrated real-world density in 
column D. The integrals of the functions f and xf are shown in the rectangle 
B26:D27. These are all numerical approximations except for the integral of fo. 
The integrals of xf use the values in columns J, M, and N. Note that the last two 
columns are not visible on Exhibit 16.1. 


16.11 Risk Aversion and Rational RNDs 


The usefulness of implied risk-neutral densities fg for the estimation of real-world 
densities fp may depend on observed option prices being correct within some 
theoretical framework. Mispriced options will complicate the interpretation of 
fg. ^ particular possibility is that out-of-the money put options on equity indices 
are overpriced, relative to other options, reflecting anxiety about market crashes 


462 16. Density Prediction for Asset Prices 


and/or buying pressure (Bates 2000, 2003; Jackwerth 2000; Bollen and Whaley 
2004). A transformation from an empirical fọ to an empirical fp may then unravel 
the effects of mispricing and produce a correctly calibrated density, but it may 
not. It is possible that all transformations that are consistent with economic theory 
produce real-world densities that are incompatible with observed real-world asset 
prices. If so, we can say fg is irrational. We now consider research that discusses 
the rationality of risk-neutral density estimates fg by making comparisons with 
real-world density estimates fp that are independently estimated from the history 
of asset prices. 

Estimates of risk aversion have been used to assess the rationality of RNDs. 
From the representative agent model, the representative utility function has deriva- 
tive 


(16.94) 


for some positive constant A (see equations (16.71) and (16.72)). A rational utility 
function has a negative second derivative for all values of x. Thus one way to assess 
the rationality of RNDs is to check if fo (x)/fp (x) decreases as x increases, after 
using some history of asset prices to estimate fp as in Section 16.2. An equivalent 
method is to estimate the risk aversion function implied by the first and second 
derivatives of the utility function, namely 


u'(x) ` Fish foa) 


RA = = 
ST LT ey FO) 
Nb. fex) 
cs log (25). (16.95) 


This function must be positive for all x if the utility function is rational. The same 
condition applies to the relative risk aversion function, 


RRA(x) = xRA(x). (16.96) 


Empirical estimates of RRA can be used to assess rationality and the applicability 
of a power utility function, whose RRA function is constant and equal to the CRRA 
parameter y (see equation (16.75)). 

An important issue when we check empirical ratio functions fo(x)/fp(x) is 
that fg is given by more information than the history of asset prices used to 
obtain fp. We know from Chapter 15 that the extra information is reflected by 
the standard deviation of fo being a more accurate predictor of volatility than the 
standard deviation of fp. These standard deviations are not even equal on average 
when volatility risk is priced, as noted in Section 14.5. It is not known what can be 
learnt from a single empirical ratio function when different information defines 
the two densities. We may hope that the noise created by the differences can be 
reduced by calculating an average across several ratio functions. 
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Figure 16.11. Logarithm of the ratio of Q- and P-densities. 


Figure 16.11 shows an estimate of the function log( fo (x)/ fp (x)) for the illus- 
trative FTSE data; Zo is obtained from the GB2 method and adjusted to make it 
contemporaneous with fp given by the ARCH density of Section 16.2. It can be 
seen that fo(x)/fp (x) is estimated to be an increasing function for 6060 < x < 
6680, which can be restated as negative risk aversion for 0.97 < x/F < 1.07. 

Jackwerth (2000), Ait-Sahalia and Lo (2000), and Rosenberg and Engle (2002) 
all estimate RA once a month from one-month options on the S&P 500 index; 
Jackwerth does this for the decade from 1986 to 1995, Ait-Sahalia and Lo only 
consider 1993 and Rosenberg and Engle cover 1991 to 1995. Their methodolo- 
gies are quite different. Jackwerth finds that the averages of the RA functions 
are credible before the crash of October 1987 but appear irrational afterwards. 
Post-crash, the estimated RA is negative in the range 0.96 < x/F RA 
also increases for x/F > 0.99, which is incompatible with power and other utility 
functions. Jackwerth concludes that the most likely explanation of the RA esti- 
mates is that the market has consistently mispriced some options. It is possible 
that his real-world density estimates contribute to the apparent irrationality; his 
kernel estimates ignore stochastic volatility while his GARCH(1, 1) estimates do 
not allow for negative skewness in one-month returns (which can be obtained 
using a GJR(1, 1) model). 

Ait-Sahalia and Lo obtain their RND estimates using the kernel regression 
methods mentioned in Section 16.7, which rely on the implied volatility function 
being stable through time. The sensitivity of their results to this suspect assumption 
is unknown. Their RRA estimates are positive, but appear to be inconsistent with 
a power utility function. 

Rosenberg and Engle use the GJR-GARCH model to estimate fp and then 
estimate empirical pricing kernels (EPKs) by polynomial functions that best 
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match observed option prices. Thus their methodology separately estimates fp 
and fo/fp. Like Jackwerth, they estimate RA to be irrational; it is negative in 
the range 0.96 < x/F < 1.02. They also show that RA is a time-varying quan- 
tity, that is counter cyclical; risk premia are low (high) near business cycle peaks 
(troughs). 

Risk aversion estimates have also been obtained for other assets, including 
Italian bond futures (Fornari and Mele 2001), the CAC 40 index (Pérignon and 
Villa 2002), and the FTSE 100 index (Liu et al. 2004). 

A different methodology for assessing the rationality of implied risk-neutral 
densities is used by Ait-Sahalia et al. (2001). Their results are for three-month 
options on the S&P 500 index from 1986 to 1994. They make no assumptions 
about the representative investor. Instead they assume index dynamics are deter- 
mined by a one-factor diffusion process, so volatility is deterministic. The risk- 
neutral version of these dynamics is estimated and shown to produce RNDs that 
differ significantly from those implied by option prices. They then show that 
adding a jump component (that permits rare crashes) can partly reconcile the dif- 
ferences between the two sets of RNDs. Related evidence against deterministic 
volatility is given in Buraschi and Jackwerth (2001). 


16.12 Tail Density Estimates 


Itis very difficult to estimate the probabilities of extreme price movements. Mar- 
ket implied densities may be useful within the range of traded exercise prices 
but outside this range they are merely extrapolations. The most practical way to 
estimate tail probabilities is to make use of the extreme value theory that was 
noted in Section 12.12. 

All densities are now supposed to be real-world densities and we consider 
shapes for the left tail, that corresponds to an extreme fall in the asset price. 
Let f(x) and F(x) denote the density and the cumulative function of a future 
price and let g(r) and G(r) denote these functions for the return defined by 
r = log(x) — log(S). 

Then extreme value theory suggests we select 


Gir) «(-r)™, r&rr, (16.97) 


for some tail index o and some threshold r;, < 0. We may suppose that one 
of the methods presented in this chapter gives us a credible density f(x) over 
some interval around the current price, from which we can determine the values 
of g(r) and G(r) for plausible threshold levels. We then use the power law 
(16.97) to specify the left tail density and cumulative functions as 


pNCGED 
g(r) = seo) , rErTL, (16.98) 
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and 


G(r) = oc) ren, (16.99) 


As G(r) is the integral of g from minus infinity to r, the tail index is constrained 


to be 
Lët) ` FLT 


Girl F(xL) 
with xz, the price level that corresponds to a return rL. 
The above remarks motivate the following empirical strategy: estimate a price 
density f (x) and then seek an appropriate threshold for which the tail index given 
by (16.100) is appropriate. Appropriate values of o are between three and five, 
according to the literature mentioned in Section 12.12. For the illustrative FTSE 
data and the real-world density defined by recalibrating the risk-neutral GB2 den- 
sity, o = 3.02 when the threshold is a futures return r, equal to — 1596. The esti- 
mated probability of a futures price below xj, = 53401s Girl = F (xL) = 2.0%. 
The probabilities of even larger price falls during the four-week period under 
scrutiny can be estimated from (16.99). For example, the estimated probability of 
the event Ar < 5040 is 1 in 130 so an event as extreme as this occurs on average 
once every ten years. The 25-year event is Sr < 4670 for which the futures price 
must fall by at least 2596. 


(16.100) 


16.13 Concluding Remarks 


Research into density prediction has so far produced many methods but few 
conclusions. Simulation of ARCH models is a straightforward method, although 
it requires a substantial amount of numerical calculations. Option-based meth- 
ods have the advantage of using more information but may be less reliable if 
option prices are incompatible with a rational theoretical framework. Further 
research, that compares and combines real-world densities derived from ARCH 
and option methodologies, is necessary to provide guidance about the most appro- 
priate method for density prediction. The likelihood of observed asset prices, 
calculated from their predictive densities, is an important statistic for measuring 
the accuracy of a method. The compatibility of cumulative probabilities with a 
uniform distribution is also important. 
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Symbols 


A summary of selected important symbols, functions, and acronyms is provided 
here. Chapter and section number references to examples and/or definitions are 
shown in square brackets. Inevitably, several symbols have more than one role. 


Roman Letters 


A 


A proportion of the variance of an ARMA(1, 1) model [3.6], 

or a measure of asymmetry in an ARCH model [10.2] 

An estimate of the variance of an autocorrelation multiplied by 

the number of observations [5.3] 

A trading rule bandwidth [7.2], or a beta coefficient [16.5] 

A price of a call option [14.2], or a trading cost [7.9] 

A dividend payment [2.5], or a level of fractional differencing [3.8] 
A function used in the BS formulae, likewise d» [14.3] 

A general distribution [9.6] 

The number 2.718 28.... 

The residual term in an ARCH model [9.5] 

A density function [3.2], or a futures price [2.5], 

or a forecast [3.5, 15.2] 

A futures or forward price [14.2], or a cumulative distribution 
function [2.5], or the Fama—French random walk test statistic [6.3] 
The residual term in the log. variance of an EGARCH model [10.2] 
A conditional variance [9.5] 

A forecast horizon [15.2], or a half-life parameter for forecasts [9.4] 
Square root of —1 

An information set [9.5], sometimes the history of prices [5.2, 7.2] 
Counts intraday trading periods [12.5] 

A jump variable [13.5], or the Jegadeesh random walk 

test statistic [6.3] 

The kurtosis [4.6], or a standardized forecast [7.2] 

The runs test statistic [6.5] 

The logarithm of an absolute excess return [4.10, 11.5], 

or the likelihood of one observation [9.4, 10.4] 
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Symbols 


The lag operator [3.5], or the likelihood of a set of 
observations [9.5, 10.4, 11.6], or the parameter of 

the channel trading rule [7.2] 

The stochastic discount factor [16.9] 

A number of observations in a time series [4.2] 

A number of: time periods over which a multi-period return is 
calculated [5.3, 11.9], information events [8.3], intraday 
periods [12.8], or jumps [13.5]; 

the general normal distribution [3.2], 

or the c.d.f. of the standard normal distribution [14.3], 

or the news impact function [10.2] 

An asset price [2.5], or the price of a put option [14.2], 

or a probability parameter [16.5], 

or the number of autoregressive parameters [3.5], 

or of lagged functions of returns in an ARCH model [9.5] 

A probability [2.5], or a real-world probability measure [14.3] 
The number of moving-average parameters [3.5], 

or of lagged conditional variances in an ARCH model [9.5], 
or the quantity of an asset [7.2], or the dividend yield rate [14.2] 
A risk-neutral probability measure [14.3], 

or a portmanteau test statistic [4.9] 

A return on investment [2.5], or a risk-free interest rate [14.2] 
An annual return [4.3], or an excess return made by 

a futures trader [7.9], or a correlation [15.3] 

A rescaled range test statistic [6.6] 

A historical standard deviation [4.2], 

or a squared excess return [4.11], 

or a spectral density function [3.3] 

An asset price in continuous time [13.4], 

or a sign variable in asymmetric ARCH [9.7] or volatility models [11.9] 
Time, measured in trading periods [2.5] 

The time until expiry of a derivative contract [14.2], 

or the trend test statistic [6.3] 

A zero-mean, unit-variance random variable [8.4], 

whose distribution is sometimes normal, 

or a utility function [16.9], or a cumulative probability [16.9] 
A forecast error when predicting squared residuals [9.2, 9.3] 
The variance of a sum of returns [5.3], 

or the stochastic variance process [13.4] 

A ratio of variances [5.3] 
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W A Wiener process [3.10, 13.2], or a BDS test statistic [6.7] 
x A possible outcome of a random variable [3.2], 
for example, a later asset price [16.3] 
X A random variable [3.2], or the exercise price of an option [14.2] 
y A target when forecasting volatility [15.2] 
Y A random variable [3.2], for example, 
the logarithm of an asset price [14.6] 
Z A standardized test statistic [4.1, 5.3, 7.3], 
or standardized residual [9.5] 
Z A Wiener component in a stochastic variance process [13.4] 


Greek Letters 


a Multiplier of the lagged squared return in an ARCH model [9.3], 
or the average log volatility in a SV model [11.5], or the mean of 
a continuous-time process [13.3] 
a An additional multiplier for squares of 
lagged negative returns [9.7] 
B Multiplier of the lagged conditional variance in 
an ARCH model [9.3], or the standard deviation of 
log. volatility ina SV model [11.5] 
y Multiplier of the absolute standardized residual in an EGARCH 
model [9.2], or the relative risk aversion parameter [16.9] 
I The gamma function [9.6] 
ô A correlation parameter in an asymmetric volatility model [11.9] 
A An autoregressive parameter in an EGARCH model [10.2], 
or a small time increment [13.3] 
E A white noise variable [3.5] 
n A white noise variable [3.4, 11.5], or a tail-thickness parameter [9.6] 
0 A moving-average parameter [3.5], or a vector of 
parameters [9.5, 11.6, 16.4] 
v Multiplier of the standardized residual in an EGARCH model [9.2] 
e A moving-average parameter [9.5] 
K A continuous-time mean-reversion parameter [13.3] 
À An autocovariance of a stochastic process [3.3], or an ARCH-M 
parameter [9.5], or a proportion of daily variance [12.5], 
or the intensity rate of a Poisson process [13.5] 
H The mean of a random variable [3.2], 
sometimes a conditional mean [9.5] 
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v The degrees-of-freedom parameter for the t-distribution [9.6], 
or an implied volatility [15.6] 
E The volatility parameter in a square-root process [13.3], 
which often measures the “volatility of volatility” [14.6] 
T The number 3.141 59... 
p An autocorrelation of a stochastic process [3.3], 
or a correlation between two Wiener processes [13.4] 
o The standard deviation of a random variable [3.2], 
or a measure of volatility [11.2] 
T The difference between two times, often called a lag [3.3] 
p An autoregressive [3.5, 11.5] or a persistence [9.7] parameter 
d The c.d.f. of the standard normal distribution [3.2] 
Vy A lognormal density function [16.3] 
o A frequency [3.3], or a constant term in an ARCH model [9.3] 
Mathematical Functions and Notation 
xt The maximum of x and zero 
exp(x) The exponential function (the number e raised to the power x) 
log(x) The natural logarithm of x 
Xx Xy x is approximately equal to y 


max(x,y) The maximum of x and y 


Notation for Random Variables [Section 3.2] 


X~ oe The distribution of X is ---. 

E[X] The expectation of a random variable X 

var(X) The variance of a random variable X 

Y|X A random variable Y conditional on another variable X 
cor(X, Y) The correlation between two random variables 


cov(X, Y) The covariance between two random variables 


Acronyms 

AR Autoregressive [3.5] 

BS Black-Scholes [14.3] 

c.d.f. Cumulative distribution function [3.2] 
CH Conditionally heteroskedastic [9.1] 
EMH  Hfficient market hypothesis [5.2] 

FI Fractionally integrated [3.8] 

I Integrated [3.7] 
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lid. Independent and identically distributed [3.2] 
MA ` Moving average [3.5] 

MD Martingale difference [3.4] 

RND  Risk-neutral density [16.3] 

RWH Random walk hypothesis [5.2] 

SV Stochastic volatility [11.2] 

SWN Strict white noise [3.4] 

WN White noise [3.4] 


Compound Acronyms 

ARFIMA Autoregressive, fractionally integrated, 
moving average [3.8] 

GARCH Generalized autoregressive conditionally 
heteroskedastic [9.3] 

EGARCH Exponential, generalized, ... [10.2] 


FIEGARCH Fractionally integrated, exponential, 
generalized, ... [10.3] 
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from intraday returns, 312-4, 321 
geometric decay, 38, 203, 237 
hyperbolic decay, 47, 341 
linear function of, 103, 129, 131 
MA(1) process, 38 
of absolute returns, 82-92, 313-4 
of logarithmic functions, 82, 86, 
237, 280 
of log realized volatility, 337-8 
of ranges, 347 
of rescaled returns, 116—9 
of returns, 76-82, 108, 312-3 
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80 
of signs, 134 
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184-6 
sample estimates, 76, 112 
bias, 113, 139 
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109, 116 
standard errors, 91 
theory, 49, 112-5 
squared linear process, 95 
tests, 80 
autocovariance, 31 
autoregressive, 
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see also ARCH model 
parameter, 36 
process, see ARMA process, 
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Bayes' theorem, 274 
Bayesian methods, 61, 286 
BDS test, 136-8, 147 
beta, 56, 64, 180, 183 
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spread, 12, 64, 178, 181, 307-11, 
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Black-Scholes formulae, see option 
prices, Black-Scholes 
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arithmetic, 356, 363 

geometric, 49, 189, 191, 346, 356, 

372, 428 

standard, 354 
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calendar effects, 59-68, 337 
autocorrelation created by, 66, 94, 
149 
calibration function, 456 
central bank, 
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and trading profits, 184, 325 
news about, 342 
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central limit theorem, 30, 54, 164, 
296, 312 
channel length, 160, 166, 171 
chaotic dynamics, 136 
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exponent, 76 
function, 389, 396 
closed-market effects, 311, 319, 333, 
417 
see also weekend effects 
complex numbers, 389, 396 
conditional, 
density, 27, 91, 197, 212, 246, 274, 
343, 396 
expectation, see expectation, 
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see also ARCH model 
probability, 2, 137, 273-4, 277, 
396 
variance, see variance, conditional 
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see also option 
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between, 
realized and forecast volatility, 
402-3 
spot and futures returns, 22 
Wiener processes, 359 
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Excel function, 104 
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zero and independence, 29 
covariance, 27, 103 
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crash, 
fear of a, 381, 461 
of October 1987, 52, 68, 78, 83, 
139, 192, 255, 333 
currency, see exchange rate 
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errors, 13, 150, 307-8 
frequency, 11, 305, 308, 332 
mining, 59 
snooping, 167, 175 
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of the month effects, 63 
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degrees-of-freedom, 218, 262, 300, 
426 
density, 24 
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calibration, 454—61 
conditional, see conditional, 
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encompassing, 454—5 
kernel estimates, 71, 230, 334, 
336, 427 
posterior, 286—7, 295, 365 
prediction, 423-66 
prior, 286-7 
real-world, see real-world density 
risk-neutral, see risk-neutral 
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see also distribution 
dependence, 
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diffusion process, 355-61 
bivariate, 359—61, 383-4, 389 
constant elasticity of variance, 357 
drift, 355, 376, 384 
increment, 354, 363 
jump, see jump process, -diffusion 
limit of ARCH models, 360-1 
Ornstein-Uhlenbeck, 357, 387, 
390 
sample path, 354 
examples, 355—9, 364, 367 
square root, 358—61, 365, 384-9, 
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volatility function, 355 
see also Brownian motion and 
Wiener process 
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distribution, 
beta, 341, 440, 456 
bimodal, 282, 436, 438 
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double exponential, 218 
exponential, 343, 362, 365 
gamma, 74, 359 
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439-42, 451, 454, 456-8, 
463, 465 
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lognormal-polynomial, 441—5 
mixture, 436 
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of normals, 73, 219, 269, 272, 
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217, 251, 391 
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standard, 26, 30, 72, 218 
Poisson, 362 
stable, 75 
Student-t, 74, 75, 218, 225, 230, 
237, 262, 265, 291, 426 
symmetric, 26, 69 
uniform, 455-6 
Weibull, 343 
distribution function, 
cumulative, 24, 345, 396, 440, 455 
normal, 373, 377 
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probability, 24 
diurnal effects, 315 
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and volatility, 242 
yield, 55, 58, 167, 183, 371, 373, 
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duration, 
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hypothesis, weak-form, 158 
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test of, 178 
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ex ante, 175—6, 180, 182, 185, 403, 
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see also volatility, forecast, 
in-sample 
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rationalizations, 59 
see also volatility, forecast, 
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Excel function, see spreadsheets 
exchange rate, 
currency futures, see price, futures, 
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trading, see trading rules, currency 
profits 
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215, 343, 356-8, 386, 405 
as optimal forecast, 402 
risk-neutral, see option price, as 
risk-neutral expectation 
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extreme values, see returns, extreme 
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fat tails, see returns, distribution 
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Kalman, see stochastic volatility 
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and returns, 59, 64 
and variance ratios, 111 
portfolios, 57 
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of returns, 162-3 
of volatility, see volatility, forecast 
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MA(1), 38 
optimal nonlinear, 35 
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forward contract, 430 
see also price, forward 
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integration, 46, 243 
of realized volatility, 337-41 
parameter d, 337-40 
frequency, 130, 339 
data, see data, frequency 
high, see high-frequency 
futures contracts, 
gearing, 176 
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on volatility, 388 
prices, see prices, futures 


gamma function, 218, 439 
Excel function, 225, 441 
Gaussian, 
distribution, see distribution, 
normal 
process, see stochastic process, 
Gaussian 
generalized method of moments, 
284—5, 300, 410 
genetic algorithms, 176, 326—7 
gold returns, 79, 85, 87 
GPH estimate, 339—40 
Great Depression, 192 


half-life, 212, 229, 322, 357 
Hermite polynomial, 443 
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heteroskedasticity 
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and volatility forecasting, 414—20 
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at-the-money approximation, 378 
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dynamics, 383 
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scaling, 408 
stochastic volatility option prices, 
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implied volatility (continued) 
used to forecast realized volatility, 
407-20 
incomplete beta function, 440, 456 
Excel function, 441 
independent 
and identically distributed random 
variables, 30, 49, 200-1, 
269, 455 
autocorrelation theory, 49, 113 
do not define the random walk 
hypothesis, 100 
returns are not, 91 
random variables, see random 
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information, 
and volatility, 193-4, 268 
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impact on related markets, 342—3 
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Itó process, 355 
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trading rules, 166, 183-6 
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call and put options, 372 
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dependence, 82-92 
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as risk-neutral expectation, 376, 
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Black-Scholes formulae, 372-7 
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option price (continued) 
commodity, 413—5, 441, 456 
crude oil, 415, 438 
currency, 382, 390, 409-10, 
413-6, 441, 445, 448 
delta, 373-4, 447-8 
FTSE 100, 379-80, 433-4, 
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hypothesis, 127, 131 
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pricing errors, 44 
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probability, 
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random walk, 236, 354, 363 
forecast, 404 
random walk hypothesis, 
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144-7 
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test methodology, 122-6 
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test results, 138-44 
test size, 122-3, 144-5 
test statistics, 
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multi-period methods, 128 
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runs test, 133-4, 147 
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spectral, 130-3 
variance-ratio, 103—5 
range, 
rescaled, 135 
see also price, range 
realized, 
bipower, 332 
correlation, 333 
covariance, 333, 342 
variance, 327 
volatility, see volatility, realized 
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from ARCH simulations, 424-8, 
457-8 
from risk-neutral density, 
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451-4, 457 
spreadsheet, 458-61 
tails, 464-5 
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definitions, 14 
distribution, 73-6 
conditional on realized 
volatility, 332-4 
fat tails, 69, 71 
high peak, 69, 71, 312 
not normal, 69—73, 312 
see also real-world density 
excess, 101 
from futures trading, 179 
extreme values, 71-3, 344-6, 
464-5 
futures, 18, 76 
intraday, 311, 316, 327, 399, 414 
logarithmic absolute, 82 
long-horizon, 45, 112 
mean-reversion, 44, 112, 129 
multi-period, 16, 102, 128 
nominal, 17 
overnight, 311, 333, 417 
predictability and trading rules, 
163-72 
rescaled, 115-20 
induced autocorrelation, 147, 
154-5 
simple, 16 
spot FX, 18 
standard deviation, 52, 57-9 
standardized, 71—2, 209, 219, 270, 
392 
distribution almost normal, 
332-5 
kurtosis, 334 
see also ARCH model, 
standardized residuals 
transformed, 81—9 
weekend, 60-1, 66, 311 
zeros, 73, 312 
risk adjustments, 175, 177, 182 
risk aversion, 462-4 
empirical estimates, 461—4 
irrational, 463—4 
relative, 252, 452, 462-4 
constant, 391 
CRRA parameter, 452-4 
risk management, 73, 424 
risk, market price of, 376, 392 
risk-neutral density, 423 
analytic, 446 
concept, 428, 430 
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risk-neutral density (continued) 
criteria, 450-1 
definition, 430 
entropy, 450 
estimation, 434—41, 444—7 
existence, 430 
GB2, 428, 439-42, 451, 454, 
456-8, 463 
implied, 429, 431—5 
irrational, 462 
issue of extrapolation, 434, 447 
lognormal, 376, 431 
mixture, 435—8, 453-4, 456 
nonparametric, 448—50 
parametric, 435-45 
recommendations, 450-1 
smoothness, 435, 449 
spline method, 448, 454 
spreadsheet, 458-61 
surveys, 431 
uniqueness, 430 
risk premium, 428, 453 
equity, 55 
futures, 56 
time-varying, 56, 148, 183 
trading rule assumption, 176 
see also volatility, risk premium 
risky bill, 179 


scaling law, 338-9 
scatter diagram, 78 
settlement, 430 
delayed, 17 
short, 
memory, see memory, short 
selling, 61, 176-7, 373 
sign bias test, 220 
significance level, 61, 123 
size, 
effect, see firm size, and returns 
of tests, see random walk 
hypothesis, test size 
skewness, 25 
of daily returns, 52, 68 
of intraday returns, 312 
of multi-period returns, 296, 427, 
458 
of risk-neutral densities, 438, 447, 
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spectral density function, 33, 47, 
130-3, 341 
ARMA(I, 1) process, 131 
long memory shape, 47, 339 
sample estimate, 131 
spline, 448 
spreadsheets, 
Black-Scholes prices, 374 
density estimation, 458-61 
GARCH(1,1) model, 205-10 
GJR(1,1) model, 222-8 
Markov volatility model, 275-7 
standard SV model, 288-91 
trading rule information test, 172 
variance-ratio test, 105 
standard deviation, 25 
of returns, 52, 57-9 
state space model, 347-8 
see also stochastic volatility 
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representation 
state variable, 242, 279, 383-4, 389, 
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stationary, see stochastic process, 
stationary 
stochastic calculus, 354-6 
stochastic differential equation, 
355-65 
stochastic discount factor, 452 
stochastic process, 30 
autocorrelated, 34 
continuous-time, 353-68 
definitions, 31 
Gaussian, 32, 92, 134, 278 
inapplicable for returns, 69 
integrated, 46 
linear, 48, 49, 92 
multivariate symmetry, 113 
nonlinear, 48 
evidence for, 82, 92-3 
see also nonlinear 
nonstationary, 32 
reversible, 300 
stationary, 32 
covariance, 32 
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uncorrelated, 34, 100, 113 
see also ARCH model and 
ARMA process and 
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and stylized facts, 271, 279 
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autocorrelations, 280-2, 295-6 
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of volatility, 273, 282, 297 
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299-301 
conditional variance, 270, 274 
contemporaneous, 293, 295 
definitions, 268-70 
distribution of volatility, 278 
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factors, 299 
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general, 293-6 
independent, 269-93 
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log-likelihood function, 283, 285 
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Monte Carlo, 286-7 
maximum likelihood estimate, 
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covariance matrix, 286 
quasi-, 285—6, 292, 295, 298 
moments, 270, 280, 294 
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option price, see option price, 
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parameter estimation, 275, 283-8, 
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with f-distributions, 291—3 
standardized prediction error, 288 
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surveys, 267-8 
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time series of volatility estimates, 
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stylized facts, 
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tail index, 345, 464 
tail-thickness parameter, 218 
tax selling hypothesis, 64 
technical analysis, 157 
thin trading, 12, 80, 134 
tick size, 78, 310 
time between trades, see duration 
time deformation, 268, 325, 364 
0-time, 321, 323 
time series, 9, 30 
examples for returns, 13, 19 
size, 11 
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trading costs, 175, 182 
breakeven, 176—8 
commission, 178 
futures, 179, 181 
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